Rethink Priorities’ Welfare Range Estimates
Key Takeaways
We offer welfare range estimates for 11 farmed species: pigs, chickens, carp, salmon, octopuses, shrimp, crayfish, crabs, bees, black soldier flies, and silkworms.
These estimates are, essentially, estimates of the differences in the possible intensities of these animals’ pleasures and pains relative to humans’ pleasures and pains. Then, we add a number of controversial (albeit plausible) philosophical assumptions (including hedonism, valence symmetry, and others discussed here) to reach conclusions about animals’ welfare ranges relative to human’s welfare range.
Given hedonism and conditional on sentience, we think (credence: 0.7) that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think (credence: 0.65) that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another.
Given hedonism and conditional on sentience, we think (credence 0.6) that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest. Invertebrates are so diverse and we know so little about them; hence, our caution.
Our view is that the estimates we’ve provided should be seen as placeholders—albeit, we submit, the best such placeholders available. We’re providing a starting point for more rigorous, empirically-driven research into animals’ welfare ranges. At the same time, we’re offering guidance for decisions that have to be made long before that research is finished
Introduction
This is the eighth post in the Moral Weight Project Sequence. The aim of the sequence is to provide an overview of the research that Rethink Priorities conducted between May 2021 and October 2022 on interspecific cause prioritization—i.e., making resource allocation decisions across species. The aim of this post is to share our welfare range estimates.
This post builds on all the others in the Moral Weight Project Sequence. In the first, we explained how we understand welfare ranges and how they might be used to make cross-species cost-effectiveness estimates. In the second, we introduced the Welfare Range Table, which reported the results of a literature review covering over 90 empirical traits across 11 farmed species. In the third, we suggested a way to quantify the impact of assuming hedonism on our welfare range estimates. In the fourth, we explained why we’re skeptical of using neuron counts as our sole proxy for animals’ moral weights. In the fifth and sixth, we explained why we aren’t convinced by some revisionary ways that people try to alter humans’ and animals’ moral weights by proposing that there are more subjects per organism than we might initially assume. In the seventh, we argued that “animal-friendly” results shouldn’t be that surprising given the Moral Weight Project’s assumptions—nor are they a good reason to think that the Project’s assumptions are mistaken.
In what follows, we’ll briefly recap our understanding of welfare ranges and our proposed way of using them. Then, we’ll summarize our methodology and respond to some questions and objections.
How can we compare benefits to the members of different species?
Many EA organizations use DALYs-averted as a unit of goodness. So, the Moral Weight Project tries to express animals’ welfare level changes in terms of DALYs-averted. This lets people conduct standard cost-effectiveness analyses across human and animal interventions. (What follows is a compressed overview of our strategy. For more detail, please see our Introduction to the Moral Weight Project.)
In the context of a cost-effectiveness analysis, a “moral weight discount” is a function that takes some amount of some species’ welfare as an input and has some number of DALYs as an output. So, the Moral Weight Project tries to provide “moral weight discounts” for 11 commercially-significant species. The interpretation of this function depends on the moral assumptions in play. The Moral Weight Project assumes hedonism (welfare is determined wholly by positively and negatively valenced experiences) and unitarianism (equal amounts of welfare count equally, regardless of whose welfare it is). Given hedonism and unitarianism, a species’s moral weight is how much welfare its members can realize—i.e., its members’ capacity for welfare. That is, everyone’s welfare counts the same, but some may be able to realize more welfare than others.
Capacity for welfare = welfare range × lifespan. An individual’s welfare range is the difference between the best and worst welfare states the individual can realize. In other words, assume we can assign a positive number to the best welfare state the individual can realize and a negative number to the worst welfare state the individual can realize. The difference between them is the individual’s welfare range.
We’re ultimately trying to convert changes in welfare levels into DALYs. So, the relevant “best” human welfare state is the average welfare level of the average human in full health. The relevant “best” animal welfare states will be analogous.
For simplicity’s sake, we assume that humans’ welfare range is symmetrical around the neutral point. So, if the “best” welfare state for a human is represented by some arbitrary positive number, then the “worst” welfare state is represented by the negation of that number. (For reasons we sketch below, this assumption matters less than you might think. For some preliminary thoughts on the symmetry assumption, see this report.)
Welfare ranges allow us to convert species-relative welfare assessments, understood as percentage changes in the portions of animals’ welfare ranges, into a common unit. To illustrate, let’s make the following assumptions:
Chickens’ welfare range is 10% of humans’ welfare range.
Over the course of a year, the average chicken is about half as badly off as they could be in conventional cages (they’re at the ~50% mark in the negative portion of their welfare range).
Over the course of a year, the average chicken is about a quarter as badly off as they could be in a cage-free system (they’re at the ~25% mark in the negative portion of their welfare range).
Given these assumptions, we can calculate the welfare gain of a cage-free campaign in DALY-equivalents averted:
Assuming symmetry around the neutral point, the negative portion of chickens’ welfare range is 10% of humans’ positive welfare range. (For instance, if humans’ welfare range is 100 and chickens’ welfare range is 10, humans range from −50 to 50 and chickens range from −5 to 5. So, the negative portion of chickens’ welfare range is still 10% of humans’ welfare range.)
Given our assumptions about the welfare impacts of the two production systems, the move from conventional cages to aviary systems averts an amount of welfare equivalent to 25% of the average chicken’s negative welfare range. (Continuing with the numbers mentioned in the previous step, it moves chickens from −2.5 to −1.25).
So, assuming symmetry around the neutral point, 25% of chickens’ negative welfare range is equivalent to 2.5% (10% × 25%) of humans’ positive welfare range.
By definition, averting a DALY averts the loss of an amount of welfare equivalent to the positive portion of humans’ welfare range for a year.
So, assuming symmetry around the neutral point, the move from conventional cages to aviary systems averts the equivalent of 0.025 DALYs per chicken per year on average.
The symmetry assumption doesn’t matter for our welfare range estimates. Instead, it matters for estimates of the total number of DALY-equivalents averted. Suppose, for instance, that humans’ welfare range is 0 to 100 (on net, their welfare is always neutral or positive) whereas chickens’ welfare range is −9 to 1 (their welfare can be 9x worse than it can be good). Our estimate of chickens’ relative welfare range would be the same: 10%. However, such an asymmetry would obviously alter the amount of welfare represented by “25% of chickens’ negative welfare range” (0.225 DALYs per chicken per year on average vs. 0.025 DALYs per chicken per year on average). To make the implications clear, we’ve developed a farmed animal welfare cost-effectiveness BOTEC that allows users to input their own assumptions about the skews of animals’ welfare ranges to convert welfare changes into DALY-equivalents averted.
Some welfare range estimates
What follows are some probability-of-sentience- and rate-of-subjective-experience-adjusted welfare range estimates. These numbers are based on:
estimates of the probability of sentience for the following taxa
welfare range estimates conditional on sentience for the following taxa, and
credence-adjusted rates of subjective experience estimates (based on Jason Schukraft’s prior work on the rate of subjective experience, about which more below).
Species | 5th-percentile | 50th-percentile | 95th-percentile |
Pigs | 0.005 | 0.515 | 1.031 |
Chickens | 0.002 | 0.332 | 0.869 |
Octopuses | 0.004 | 0.213 | 1.471 |
Carp | 0 | 0.089 | 0.568 |
Bees | 0 | 0.071 | 0.461 |
Salmon | 0 | 0.056 | 0.513 |
Crayfish | 0 | 0.038 | 0.491 |
Shrimp | 0 | 0.031 | 1.149 |
Crabs | 0 | 0.023 | 0.414 |
Black Soldier Flies | 0 | 0.013 | 0.196 |
Silkworms | 0 | 0.002 | 0.073 |
We provide the technical details in this document. We now turn to the more general methodology behind these numbers.
How did we estimate relative welfare ranges?
Given hedonism, an individual’s welfare range is the difference between the welfare level associated with the most intense positively valenced experience the individual can realize and the welfare level associated with the most intense negatively valenced experience that the individual can realize. So, we looked for evidence of variation in the capacities that generate positively and negatively valenced experiences.
Since there are no agreed-upon objective measures of the intensity of valenced states, we pursued a four-step strategy:
Make some plausible assumptions about the evolutionary function of valenced experiences
Given those functions, identify a lot of empirical traits that could serve as proxies for variation with respect to those functions
Survey the literature for evidence about those traits
Aggregate the results
There are many theories of valence, not all of which are mutually exclusive. For instance, some think that valenced experiences represent information in a motivationally-salient way (“That’s good” / “That’s bad” / “That’s really good” / etc.; Cutter & Tye 2011), others that valenced experiences provide a common currency for decision-making (“A feels better than B” / “C feels worse than D”; Ginsburg & Jablonka 2019), and others still that they facilitate learning (“If I do X, I feel good” / “If I do Y, I feel bad”; Damasio & Carvalho 2013). In all three cases, there are potential links between valence and conceptual or representational complexity, decision-making complexity, and affective (emotional) richness.
We conducted a large literature review for traits that could serve as indicators of conceptual or representational complexity, decision-making complexity, and affective richness, involving over 100 qualitative and quantitative proxies across 11 species. The literature review is available here. Descriptions of the proxies are available here (and for the “quantitative proxies” model, here).
We aggregated the results. However, aggregation raises lots of thorny methodological issues. So, we opted to build several models. For a variety of reasons, though, we ultimately opted not to include them all in our estimates: some could be accused of stacking the deck in favor of animals (the Equality Model), some were missing too much data (the Quantitative Model), and some involved assumptions that went beyond the key assumptions of the Moral Weight Project (the Grouped Proxy Model and the JND Model). We then took the remaining models and used Monte Carlo simulations to estimate the distribution of welfare ranges, as detailed here.
Jason Schukraft estimated that there’s a ~70% chance that there exist morally relevant differences in the rate of subjective experience and a ~40% chance that CFF values roughly track the rate of subjective experience under ideal conditions. So, we applied a credence-discounted adjustment to our welfare range estimates by the CFF for a given species. Since this proxy suggests that some animals have a faster rate of subjective experience than humans, it supports greater-than-human welfare range estimates on some models.
Finally, we adjusted our estimates based on our best guess estimates of the probability of sentience. We generated those estimates by extending and updating Rethink Priorities’ Invertebrate Sentience Table and then aggregating the results as detailed here.
Questions about and objections to the Moral Weight Project’s methodology
“I don’t share this project’s assumptions. Can’t I just ignore the results?”
We don’t think so. First, if unitarianism is false, then it would be reasonable to discount our estimates by some factor or other. However, the alternative—hierarchicalism, according to which some kinds of welfare matter more than others or some individuals’ welfare matters more than others’ welfare—is very hard to defend. (To see this, consider the many reviews of the most systematic defense of hierarchicalism, which identify deep problems with the proposal.)
Second, and as we’ve argued, rejecting hedonism might lead you to reduce our non-human animal estimates by ~⅔, but not by much more than that. This is because positively and negatively valenced experiences are very important even on most non-hedonist theories of welfare.
Relatedly, even if you reject both unitarianism and hedonism, our estimates would still serve as a baseline. A version of the Moral Weight Project with different philosophical assumptions would build on the methodology developed and implemented here—not start from scratch.
“So you’re saying that one person = ~three chickens?”
No. We’re estimating the relative peak intensities of different animals’ valenced states at a given time. So, if a given animal has a welfare range of 0.5 (and we assume that welfare ranges are symmetrical around the neutral point), that means something like, “The best and worst experiences that this animal can have are half as intense as the best and worst experiences that a human can have”—remembering that, in this context, the welfare level associated with “best experiences that a human can have” is the average welfare level of the average human in full health, which, presumably, is lower than the most intense pleasure humans are physically capable of experiencing.
Because we’re estimating the relative intensities of valenced states at a time, not over time, you have to factor in lifespan to make individual-to-individual comparisons. Suppose, then, that the animal just mentioned—the one with a welfare range of 0.5—has a lifespan of 10 years, whereas the average human has a lifespan of 80. Then, humans have, on average, 16x this animal’s capacity for welfare; equivalently, its capacity for welfare is 0.0625x a human’s capacity for welfare.
However, while there are decision-making contexts where total capacity for welfare matters, they aren’t the most pressing ones. In practice, we rarely compare the value of creating animal lives with the value of creating human lives. Instead, we’re usually comparing either improving animal welfare (welfare reforms) or preventing animals from coming into existence (diet change → reduction in production levels) with improving human welfare or saving human lives. Whatever combination we consider, total capacity for welfare isn’t relevant. Instead, we want to know things like how much suffering we can avert via some welfare reform vs. how many years of human life will this intervention save. Welfare ranges can be helpful in answering the former question.
“I can’t believe that bees beat salmon!”
We also find it implausible that bees have larger welfare ranges than salmon. But (a) we’re also worried about pro-vertebrate bias; (b) bees are really impressive; (c) there’s a great deal of overlap in the plausible welfare ranges for these two types of animals, so we aren’t claiming that their welfare ranges are significantly different; and (d) we don’t know how to adjust the scores in a non-arbitrary way. So, we’ve let the result stand. (We’d make similar points in response to: “I can’t believe that octopuses beat carp!”)
“Even granting the project’s assumptions, it seems obvious that [insert species] have much smaller welfare ranges than you’re suggesting. If the empirical evidence doesn’t demonstrate that, isn’t it a problem with the empirical evidence?”
No. First, the empirical evidence is our only objective guide to animals’ abilities—avoiding the twin mistakes of anthropomorphism (attributing human characteristics to nonhumans) and what Franz de Waal calls “anthropodenial”—i.e., “the a priori rejection of shared characteristics between humans and animals.” So, we’re inclined to defer to it.
This deference, plus the assumption of hedonism, do a lot of work in explaining our estimates. Given our deference to the empirical literature, we aren’t positing differences if we can’t cite justifications for them. Given hedonism, lots of apparent differences between humans and animals don’t matter, as they’re irrelevant to the intensities of the valenced states. So, if our results seem counterintuitive, it may be that implicit disagreements about these assumptions explain that reaction.
Second, recall that we’re treating missing data as evidence against sentience and for larger welfare range differences. So, while the empirical evidence is limited, we aren’t using that fact to stack the deck in animals’ favor—quite the opposite.
Third, even if the results are counterintuitive, that is not necessarily a reason to reject the estimates (as we argue here). After all, it’s an open question whether we should trust any of our intuitions about animals’ ability to generate welfare, especially if those intuitions are driven by thinking about the practical implications of these estimates. There are many, many other assumptions that need to be in place before these estimates have any practical implications at all. So, if the practical implications are counterintuitive, those other assumptions are just as much to blame.
“I’m skeptical that [insert proxy] has much to do with welfare ranges.”
In some cases, we share that skepticism; we readily grant that the proxy list could be refined. However, there is either a version of hedonism or a theory about valenced states on which each of the proxies bears on differences in welfare ranges. We couldn’t resolve all those theoretical issues in the time available. Moreover, we could reject certain proxies if we had independent ways to check whether our welfare range estimates are accurate. Plainly, though, we don’t. So, it’s best to err on the side of inclusiveness. Indeed, the proxy list could be expanded. We opted for a fairly inclusive approach to the proxies, which made the project enormous. Still, there are many other traits that could have been included—and, in some cases, perhaps ought to have been included in a list of this length.
If we can make progress on the relevant theoretical issues, we can refine our proxy list. Until then, we’re navigating uncertainty by incorporating as many reasonable approaches as possible.
“How could there be as many ‘unknowns’ as you’re suggesting? After all, in this context, ‘not-unknown’ just means ‘above or below 50% however slightly’—and surely that’s a low bar.”
We thought it was important to have domain experts review the literature whenever possible. However, domain experts are academics. Academics are socialized into a community where it’s inappropriate to make some positive claim (“Pigs have this trait” or “pigs lack that trait”) without being able to establish that claim to the satisfaction of their peers. There are good reasons to value this socialization in the present case. For instance, it’s difficult to predict which traits an organism will have based on its other traits. Moreover, it’s difficult to predict whether one kind of organism will have a trait because a related kind of organism does. Still, even though the probability ranges we mentioned earlier establish a very low bar for “lean yes” and “lean no” (above and below 50%, respectively), we defaulted to “unknown” when we couldn’t find any relevant literature. Even if our approach is defensible, other reasonable literature reviewers may have had more “lean yes” and “lean no” assessments than we did.
“You’re assessing the proxies as either present or absent, but many of them obviously come either in degrees or in qualitatively different forms.”
This is indeed a limitation; we readily acknowledge that many of the proxies are relatively coarse-grained. Consider a trait like reversal learning: namely, the ability to suppress a reward-related response, which involves stopping one behavior and switching to another. This trait comes in degrees: some animals can learn to suppress a reward-related response in fewer trials; and, having learned to suppress a reward-related response at all, some can suppress their response more quickly. A more sophisticated version of the project would account for this variation.
However, it isn’t clear what to do about it, as the empirical literature doesn’t provide straightforward ways to score animals on many of these proxies. This problem might be solvable in the case of reversal learning specifically, since we can, at the very least, measure the rate at which the animal learns to suppress the reward-related response. In other cases, the problem is much harder. For instance, parental care is obviously different in humans than in chickens. But we don’t see how to quantify the difference without making many controversial assumptions that, in all likelihood, will simply smuggle in a range of pro-human biases. So, given the current state of knowledge, the present / absent approach seems best.
“It isn’t even clear to me that [insert species] are sentient. So, why should I accept your estimate of their (ostensible) welfare range?”
You shouldn’t. Instead, you should adjust our probability-of-sentience-conditioned estimate based on your credence in the hypothesis that [insert species] are sentient.
That being said, there is deep uncertainty about consciousness generally and sentience specifically. In the face of that uncertainty, we think there’s no good argument for assigning a credence below 0.3 (30%) to the hypothesis that normal adult pigs, chickens, carp, and salmon are sentient. Likewise, we think there’s no good argument for assigning a credence below 0.01 (1%) to the hypothesis that normal adult members of the invertebrate species of interest are sentient. So, skepticism about sentience might lead you to discount our estimates, but probably by fairly modest rates.
“Your literature review didn’t turn up many negative results. However, there are lots of proxies such that it’s implausible that many animals have them. So, your welfare range estimates are probably high.”
This is a good objection. However, it isn’t clear how aggressively to discount our results because of it. After all, we know so little about animals’ lives. In many cases, no one has cared enough to investigate welfare-relevant traits; in many other cases, no one knows how to investigate them. Moreover, the history of research on animals suggests that we’ll be surprised by their abilities. So, of the unknown proxies for any given species, we should expect to find at least some positive results—and perhaps many positive results. The upshot is that while it might make sense to discount our estimates by some modest rate (e.g., 25%—50%), we don’t think it would be reasonable to discount them by, say, 90%, much less 99%.
In any case, we should stress that we aren’t inflating our estimates: we’re just following what seems to us to be a reasonable methodology, premised on deferring to the state of current knowledge. As we learn more about these animals, we should—and will indeed—update.
In future work, we could make inferences about proxy possession from more distant taxa. Or, we could try using a modern missing data method to account for any potential systematic trends in why some species-model pairs have no extant evidence.
“Shouldn’t you give neuron counts more weight in your estimates?”
We discuss neuron counts in depth here. In brief, there are many reasons to be skeptical about the value of neuron counts as proxies for welfare ranges. Moreover, some ways of incorporating neuron counts would increase our welfare range estimates for invertebrates, not decrease them. So, we already regard the weight currently assigned as a kind of compromise with community credences.
“You don’t have a model that’s based on the possibility that the number of conscious systems in a brain scales with neuron counts (i.e., ‘the Conscious Subsystems Hypothesis’).”
We discuss the conscious subsystems hypothesis in depth here. The conscious subsystems hypothesis is a highly controversial philosophical thesis. So, given our methodological commitment to letting the empirical evidence drive the results, we decided not to include this hypothesis in our calculations.
How confident are we in our estimates and what would change them?
No one should be very confident in any estimate of a nonhuman animal’s welfare range. We know far too little for that. However, we’re reasonably confident about some things.
Given hedonism and conditional on sentience, we think (credence: 0.7) that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
Given hedonism and conditional on sentience, we think (credence: 0.65) that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another.
While humans have some unique and impressive abilities, those abilities have histories; they didn’t just pop into existence when humans came on the scene. Many nonhuman animals have precursors to these abilities (or variants on them, adapted to animals’ particular ecological niches).
Moreover, and more importantly, it isn’t clear that many of these impressive abilities make much difference to the intensity of the valenced states that humans can realize. Instead, humans seem to realize a much greater variety of valenced states. If hedonism is true, though, variety probably doesn’t matter; intensity does the work.
Given hedonism and conditional on sentience, we think (credence 0.6) that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest. Invertebrates are so diverse and we know so little about them; hence, our caution.
As for what would change our mind, the main thing is research on the proxies. In principle, research on the proxies could alter our welfare range estimates significantly. Right now, the proxies are fairly coarse-grained and we aren’t confident about their relative importance. If, for instance, we were to learn there are ten levels of reversal learning and that shrimp only reach the second, that could significantly alter our results. Likewise, if we were to learn that having a self-concept is 10x more important than parental care when it comes to estimating differences in welfare ranges, that could significantly alter our results.
Conclusion
Our view is that the estimates we’ve provided are placeholders. Our estimates will change as we learn more about all animals, human and nonhuman. They will change as we learn more about the various traits we share with nonhuman animals and the various traits we don’t share with them. They will change with advances in comparative cognition, neuroscience, philosophy, and various other fields. We’re under no illusions that we’re providing the last word on this topic. Instead, we’re providing a starting point for more rigorous, empirically-driven research into animals’ welfare ranges. At the same time, we’re offering guidance for decisions that have to be made long before that research is finished.
Acknowledgments
This research is a project of Rethink Priorities. It was written by Bob Fischer. For help at many different stages of this project, thanks to Meghan Barrett, Marcus Davis, Laura Duffy, Jamie Elsey, Leigh Gaffney, Michelle Lavery, Rachael Miller, Martina Schiestl, Alex Schnell, Jason Schukraft, Will McAuliffe, Adam Shriver, Michael St. Jules, Travis Timmerman, and Anna Trevarthen. If you’re interested in RP’s work, you can learn more by visiting our research database. For regular updates, please consider subscribing to our newsletter.
- Open Phil Should Allocate Most Neartermist Funding to Animal Welfare by 19 Nov 2023 17:00 UTC; 521 points) (
- Net global welfare may be negative and declining by 26 Sep 2023 18:49 UTC; 287 points) (
- The Scale of Fetal Suffering in Late-Term Abortions by 17 Mar 2024 19:46 UTC; 214 points) (
- Counterproductive Altruism: The Other Heavy Tail by 1 Mar 2023 9:58 UTC; 186 points) (
- 6 Dec 2023 14:43 UTC; 164 points) 's comment on Open Phil Should Allocate Most Neartermist Funding to Animal Welfare by (
- Electric Shrimp Stunning: a Potential High-Impact Donation Opportunity by 13 Jul 2023 0:39 UTC; 155 points) (
- Animal Welfare − 6 Months in 6 Minutes by 8 Feb 2023 21:45 UTC; 145 points) (
- Improving EA Communication Surrounding Disability by 13 Jun 2023 13:18 UTC; 129 points) (
- What AI could mean for animals by 6 Oct 2023 8:36 UTC; 117 points) (
- Prioritising animal welfare over global health and development? by 13 May 2023 9:03 UTC; 112 points) (
- Farmed animals are neglected by 24 Jun 2024 16:49 UTC; 108 points) (
- Long Reflection Reading List by 24 Mar 2024 16:27 UTC; 92 points) (
- Impartialist Sentientism and Existential Anxiety about Moral Circle Explosion by 22 Jun 2024 19:53 UTC; 90 points) (
- How Rethink Priorities is Addressing Risk and Uncertainty by 7 Nov 2023 13:42 UTC; 89 points) (
- Posts from 2023 you thought were valuable (and underrated) by 21 Mar 2024 23:34 UTC; 82 points) (
- How do you deal with the “meat eater” problem by 22 Aug 2024 20:26 UTC; 80 points) (
- Fundamentals of Global Priorities Research in Economics Syllabus by 8 Aug 2023 12:16 UTC; 74 points) (
- 2023: highlights from the year, from the EA Newsletter by 5 Jan 2024 21:57 UTC; 68 points) (
- Subscripts for Probabilities by 13 Apr 2023 18:32 UTC; 67 points) (LessWrong;
- Should I donate my kidney or part of my liver? by 10 Apr 2024 15:05 UTC; 66 points) (
- Cost-effectiveness of Shrimp Welfare Project’s Humane Slaughter Initiative by 6 Oct 2024 8:25 UTC; 65 points) (
- Founders Pledge’s Climate Change Fund might be more cost-effective than GiveWell’s top charities, but it is much less cost-effective than corporate campaigns for chicken welfare? by 5 May 2024 9:10 UTC; 60 points) (
- Scale of the welfare of various animal populations by 19 Mar 2023 7:10 UTC; 58 points) (
- 20 Nov 2023 2:03 UTC; 57 points) 's comment on Open Phil Should Allocate Most Neartermist Funding to Animal Welfare by (
- Modelling the outcomes of animal welfare interventions: One possible approach to the trade-offs between subjective experiences by 29 May 2024 9:06 UTC; 55 points) (
- Finding bugs in GiveWell’s top charities by 23 Jan 2023 16:49 UTC; 46 points) (
- The Pointer Resolution Problem by 16 Feb 2024 21:25 UTC; 41 points) (LessWrong;
- Resources: Pursuing a career in animal advocacy (even if you’re a longtermist!) by 23 Sep 2023 1:33 UTC; 37 points) (
- Cost-effectiveness of School Plates by 25 May 2024 9:01 UTC; 33 points) (
- 2 May 2024 14:43 UTC; 33 points) 's comment on AMA: Lewis Bollard, Program Director of Farm Animal Welfare at OpenPhil by (
- 27 Nov 2023 23:56 UTC; 29 points) 's comment on Open Phil Should Allocate Most Neartermist Funding to Animal Welfare by (
- Overview of Shrimp Farming and Questions Surrounding it by 26 Apr 2023 5:57 UTC; 27 points) (
- Badness of eating farmed animals in terms of smoking cigarettes by 22 Jul 2023 8:45 UTC; 26 points) (
- Ambitious Impact’s cost-effectiveness estimates suggest the best interventions in animal welfare are much more cost-effective than the best in global health and development? by 11 Oct 2024 16:32 UTC; 25 points) (
- 7 Oct 2023 16:38 UTC; 24 points) 's comment on Our planned allocation to GiveWell’s recommendations for the next few years by (
- Cost-effectiveness of buying organic instead of barn eggs by 14 Jun 2024 16:29 UTC; 23 points) (
- 22 Nov 2023 15:55 UTC; 20 points) 's comment on Open Phil Should Allocate Most Neartermist Funding to Animal Welfare by (
- 7 Oct 2023 15:32 UTC; 19 points) 's comment on Our planned allocation to GiveWell’s recommendations for the next few years by (
- 15 Oct 2024 9:35 UTC; 17 points) 's comment on Explaining the discrepancies in cost effectiveness ratings: A replication and breakdown of RP’s animal welfare cost effectiveness calculations by (
- Highlights From Our 2023 Reddit AMA by 29 Nov 2023 11:41 UTC; 16 points) (
- EA & LW Forum Weekly Summary (23rd − 29th Jan ’23) by 31 Jan 2023 0:36 UTC; 16 points) (
- 1 May 2024 12:29 UTC; 15 points) 's comment on AMA: Lewis Bollard, Program Director of Farm Animal Welfare at OpenPhil by (
- More to explore on ‘Radical Empathy’ by 4 Jul 2022 23:00 UTC; 12 points) (
- Welfare ranges per calorie consumption by 24 Jun 2023 8:47 UTC; 12 points) (
- EA & LW Forum Weekly Summary (23rd − 29th Jan ’23) by 31 Jan 2023 0:36 UTC; 12 points) (LessWrong;
- A Utilitarian Framework with an Emphasis on Self-Esteem and Rights by 8 Apr 2024 11:15 UTC; 7 points) (
- 26 Feb 2023 15:39 UTC; 6 points) 's comment on Immigration reform: a shallow cause exploration by (
- 28 Jan 2023 20:28 UTC; 6 points) 's comment on Finding bugs in GiveWell’s top charities by (
- 23 Jun 2023 16:10 UTC; 6 points) 's comment on AMA: Ed Mathieu, Head of Data & Research at Our World in Data by (
- 5 Feb 2023 9:09 UTC; 6 points) 's comment on Demodex mites: Large and neglected group of wild animals by (
- 20 Feb 2023 16:31 UTC; 3 points) 's comment on Shallow Report on Coronary Heart Disease by (
- 22 Jun 2023 6:43 UTC; 3 points) 's comment on Principles for AI Welfare Research by (
- 18 Feb 2024 17:27 UTC; 3 points) 's comment on The Pointer Resolution Problem by (LessWrong;
- 17 Nov 2023 11:00 UTC; 2 points) 's comment on Announcing Our 2023 Charity Recommendations by (
- 15 Jun 2023 13:10 UTC; 2 points) 's comment on Beware popular discussions of AI “sentience” by (
- 9 Oct 2023 23:07 UTC; 2 points) 's comment on Our planned allocation to GiveWell’s recommendations for the next few years by (
- 28 Feb 2024 8:46 UTC; 2 points) 's comment on Solution to the two envelopes problem for moral weights by (
- 22 Jun 2023 6:56 UTC; 2 points) 's comment on Principles for AI Welfare Research by (
- 10 Aug 2023 11:22 UTC; 2 points) 's comment on Fundamentals of Global Priorities Research in Economics Syllabus by (
- 7 Oct 2023 14:21 UTC; 2 points) 's comment on Deep Report on Diabetes by (
- 1 Oct 2023 10:30 UTC; 1 point) 's comment on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem by (LessWrong;
- 5 May 2024 16:40 UTC; NIL points) 's comment on Founders Pledge’s Climate Change Fund might be more cost-effective than GiveWell’s top charities, but it is much less cost-effective than corporate campaigns for chicken welfare? by (
- 22 Mar 2023 17:15 UTC; -15 points) 's comment on Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds by (
Hi Bob & team,
Really great work. Regardless of my specific disagreements, I do think calculating moral weights for animals is literally some of the highest value work the EA community can do, because without such weights we cant compare animal welfare causes to human-related global health/longtermism causes—and hence cannot identify and direct resources towards the most important problems. And I say this as someone who has always donated to human causes over animal ones, and who is not, in fact, vegan.
With respect to the post and the related discussion:
(1) Fundamentally, the quantitative proxy model seems conceptually sound to me.
(2) I do disagree with the idea that your results are robust to different theories of welfare. For example, I myself reject hedonism and accept a broader view of welfare (given that we care about a broad range of things beyond happiness, e.g. life/freedom/achievement/love/whatever). If (a) such broad welfarist views are correct, (b) you place a sufficiently high weight on the other elements of welfare (e.g. life per se, even if neutral valenced), and (c) you don’t believe animals can enjoy said elements of welfare (e.g. if most animals aren’t cognitively sophisticated enough to have preferences over continued existence), then an additional healthy year of human life would plausibly be worth a lot more than an equivalent animal year even after accounting for similar degrees of suffering and the relevant moral weights as calculated.
(3) I would like to say, for the record, that a lot of the criticism you’re getting (and I don’t exempt myself here) is probably subject to a lot of motivated reasoning. I am personally uncertain as to the degree to which I should discount my own conclusions over this reason.
(4) My main concern, as someone who does human-related cause prioritization research, is the meat eater argument and whether helping to save human lives is net negative from overall POV, given the adverse consequences for animal suffering. I am moderately optimistic that this is not so, and that saving human lives is net positive (as we want/need it to be) . Having very roughly run the numbers myself using RP’s unadjusted moral weights (i.e. not taking into account point 2 above) and inputting other relevant data (e.g. on per capita consumption rate of meat), my approximate sense is that in saving lives we’re basically buying 1 full week of healthy human life for around 6 days of chicken suffering or above 2 days of equivalent human suffering—which is worth it.
Thanks for the kind words about the project, Joel! Thanks too for these thoughtful and gracious comments.
1. I hear you re: the quantitative proxy model. I commissioned the research for that one specially because I thought it would be valuable. However, it was just so difficult to find information. To even begin making the calculations work, we had to semi-arbitrarily fill in a lot of information. Ultimately, we decided that there just wasn’t enough to go on.
2. My question about non-hedonist theories of welfare is always the same: just how much do non-hedonic goods and bads increase humans’ welfare range relative to animals’ welfare ranges? As you know, I think that even if hedonic goods and bads aren’t all of welfare, they’re a lot of it (as we argue here). But suppose you think that non-hedonic goods and bads increase humans’ welfare range 100x over all other animals. In many cost-effectiveness calculations, that would still make corporate campaigns look really good.
3. I appreciate your saying this. I should acknowledge that I’m not above motivated reasoning either, having spent a lot of the last 12 years working on animal-related issues. In my own defense, I’ve often been an animal-friendly critic of pro-animal arguments, so I think I’m reasonably well-placed to do this work. Still, we all need to be aware of our biases.
4. This is a very interesting result; thanks for sharing it. I’ve heard of others reaching the same conclusion, though I haven’t seen their models. If you’re willing, I’d love to see the calculations. But no pressure at all.
Hi Joel,
Hedonism is compatible with caring about “life/freedom/achievement/love/whatever”, because all of those describe sets of conscious experiences, and hedonism is about valuing conscious experiences. I cannot think of something I value independently of conscious experiences, but I would welcome counterexamples.
There’s the standard philosophical counterexample of the experience machine, including the reformulated Joshua Greene example that addressed status quo bias. But basically, the idea is this—would you rather that the world was real or just an illusion as you’re trapped as a brain in a vat (with the subjective sensory experience itself otherwise identical)? Almost certainly, and most people will give this answer, you’ll want the world to be real. That’s because we don’t just want to think that you’re free/successful/in a loving relationship—we also actually want to be all those things.
In less philosophical terms, you can think about how would not want your friends and families and family to actually hate you (even if you couldn’t tell the different). And that would also be why people care about having non-moral impact even after they’re dead (e.g. authors hoping their posthumously published book is successful, or some athlete wanting their achievements to stand the test of time and not being bested at the next competition, or some mathematician wanting to prove some conjecture and not just think he did).
Thanks for the reply, Joel!
It depends on the specific properties of the real and simulated world, but my answer would certainly be guided by hedonic considerations:
My personal hedonic utility would be the same in the simulated and real worlds, so it would not be a deciding factor.
If I were the only (sentient) being in the simulated world, and there were lots of (sentient) beings in the real world, the absolute value of the total hedonic utility would be much larger for the real world.
As a result, I would prefer:
The real world if I expected the mean experience per being there to be positive (i.e. positive total hedonic utility).
The simulated world if I expected the mean experience per being in the real world to be negative (i.e. negative total hedonic utility), and I had positive experiences myself in the simulated world.
Hedonism says all that matters is conscious experiences, but that does mean we should be indifferent between 2 worlds where our personal concious experiences are the same. We still have to look into the experiences of other beings, unless we are perfectly egoistic, which I do not think we should be.
For me, a true counterexample to hedonism would have to present 2 worlds in which expected total (not personal) hedonistic utility (ETHU) were the same, and people still preferred one of them over the other. However, since we do not understand well how to calculate ETHU, we can only ensure 2 worlds have the same of it if they are exactly the same, in which case it does not make sense to prefer one over the other.
I agree. However, as I commented here, that is only an argument against egoistic hedonism, not altruistic hedonism (which is the one I support).
You can imagine a) everyone in their own experience machine isolated from everyone else, so that all the other “people” inside are not conscious (but the people believe the others are conscious, and there’s no risk they’ll find out they aren’t), or b) people genuinely interacting with each other (in the real world, or virtual reality), making real connections with other real people. I think most people would prefer the latter for themselves, even if it makes them somewhat worse off. An impartial hedonistic view would recommend disregarding these preferences and putting everyone in the isolated experience machines anyway.
Thanks for the clarification! Some thoughts:
Not related to your point, but I would like to note it seems quite extreme to reject the application of hedonism in the context of welfare range estimates based on such thought experiment.
It is unclear to me whether ETHU is greater in a) or b). It depends on whether it is more efficient to produce it via experience machines or genuine interactions (I suppose utility per being would be higher with experience machines, but maybe not utility per unit resources). So I do not think people preferring a) or b) is good evidence that there is something else which matters besides ETHU.
It does not seem possible to make a hard distinction between a) and b). I am only able to perceive reality via my own conscious experience, so there is a sense in which my body is in fact an experience machine.
I believe most people preferring b) over a) is very weak evidence that b) is better than a). Our intuitions are biased towards assessing the thought experiment based on how the words used to describe it make us feel. As a 1st approximation, I think people would be thinking about whether “genuine” and “real” sound better than “machine” and “isolated”, and they do, so I am not surprised most people prefer b).
Being genuinely loved rather than just believing you are loved could matter to your welfare even if it doesn’t affect your conscious experiences. Knowing the truth even of it makes no difference to your experiences. Actually achieving something rather than falsely believing you achieved it.
Thanks for the examples, Michael!
I would say they work as counterexamples to egoistic hedonism, but not to altruistic hedonism (the one I support). In each pair of situations you described, my mental states (and therefore personal hedonic utility) would be the same, but the experiences of others around me would be quite different (and so would total hedonic utility):
Pretending to love should feel quite different from loving, and being fake generally leads to worse outcomes.
One is better positioned to improve the mental states of others if one knows what is true.
Actually achieving something means actually improving the mental states of others (to the extent one is altruistic), rather than only believing one did so.
For these reasons, rejecting wireheading is also compatible with hedonism. A priori, it does not seem like the best way to help others. One can specify in thought experiments that “everyone else[’s hedonic utility] is taken care of”, but I think it is quite hard to conditional human answers on that, given that lots of our experiences go against the idea that having delusional experiences is both optimal for us and others.
Would love to see the draft calculations from point 4 as well.
Hi Ula,
FYI, this and this could also be relevant for analysing the meat eater problem. The posts are not updated with RP’s moral weight estimates, but the models should still be useful (and I am happy to update them with RP’s estimates if you think it is useful).
Will DM on slack!
Hi Joel,
Great to know you are considering impacts on animals! Even if the meat eater problem is not a major concern according to your calculations, has CEARCH considered that the best animal welfare interventions may be orders of magnitude more cost-effective than GiveWell’s top charities? CEARCH uses a cost-effectiveness bar of 10 times the cost-effectiveness of GiveWell’s top charities, but I think this is very low. I estimated corporate campaigns for broiler welfare are 1.71 k times as cost-effective as the lowest cost to save a life among GW’s top charities.
With respect to the meat eater problem, I think the conclusion depends on the country. This influences the consumption per capita of animals, how much of each animal species is consumed, and the conditions of the animals. High income countries will tend to have greater consumption per capita and worse conditions, given the greater prevalence of factory-farming. For reference:
I estimated the annual suffering of all farmed animals combined is 4.64 times the annual happiness of all humans combined, which goes against your conclusion. For simplicity, I set the welfare per time as a fraction of the welfare range of each farmed animal of any species to a value I got for broilers in a reformed scenario.
However, I estimated accounting for farmed animals only decreases the cost-effectiveness of GiveWell’s top charities by 14.5 %, which is in line with your conclusion. Yet, I am underestimating the reduction in cost-effectiveness due to using current consumption, given it will tend to increase with economic growth.
I think considering impacts on animals may well affect CEARCH’s prioritisation:
Interventions in different countries may have super different impacts on animals (as illustrated by the 2 distinct conclusions above). I guess this is more relevant for CEARCH than GiveWell because I have the impression you have been assessing interventions whose beneficiaries are from a set of less homogeneous countries, which means the impacts on animals will vary more, and therefore cannot be neglected so lightly.
Interventions to extend life have different implications from interventions to improve quality of life. In general, interventions which improve quality of life without affecting lifespan and income much will have smaller impacts on animals (at least nearterm, i.e. neglecting how population size changes economic growth, and hence the trajectory of the consumption of animals). This is relevant to CEARCH because you have looked not only into interventions mostly saving lives and increasing income, but also into mental health.
I also encourage you to publish your estimates regarding the meat eater problem. I am not aware of any evaluator or grantmaker (aligned with effective altruism or not) having ever published a cost-effectiveness analysis of an intervention to improve human welfare which explicitly considered the impacts on farmed animal welfare (although I am aware of another besides you which have an internal analysis). So CEARCH would be the 1st to do so. For the reasons above, I think it would also be great if you included impacts on animals as a standard feature of your cost-effectiveness analyses.
At risk of jeopardizing EA’s hard-won reputation of relentless internal criticism:
Even setting aside its object-level impact-relevant criteria (truth, importance, etc), this is just enormously impressive both in terms of magnitude and quality. The post itself gives us readers an anchor on which to latch critiques, questions, and comments, so it’s easy to forget that each step or decision in the whole methodology had to be chosen from an enormous space of possibilities. And this looks— at least on a first red—like very many consecutive well-made steps and decisions
Thanks for the kind words, Aaron!
I’m curating the post. I should note that I think I agree with a big chunk of Joel’s comment.
I notice I’m quite confused about the symmetry assumption. For example: suppose we have two animals — M and N — and they’re both at the worst end of their welfare ranges (~0th percentile) and have equal lifespans (and there are no indirect effects). M has double the welfare range of N. If we assume that their welfare ranges are symmetric around the neutral point, then replacing one M with one N is similar to moving M from the 0th percentile of its welfare range to the 25th. If, however, their welfare ranges aren’t symmetric — say M’s is skewed very positive and N’s is skewed very negative — then we could actually be making the situation worse. In the BOTEC spreadsheet you linked, you seem to resolve this by requiring people to state the specific endpoints of the welfare ranges relative to the neutral point. If that’s the main solution, it seems very important to be clear about where the neutral point is for different animals, and that seems really hard — I’m curious if you have thoughts on how to approach that. (Maybe you assume that welfare ranges are generally close to symmetric, or asymmetric in similar ways? If so, I would like to understand why you think that.) It’s also very possible that I misunderstood something; I was reading things fast and haven’t read all the linked posts and documents.
To make sure that I understand (the broad strokes of the rest of the framework) correctly; suppose I want to use this framework and these welfare range estimates to help me decide between two (completely hypothetical, unrealistic) options — assuming that every animal’s welfare range is symmetric around the neutral point: (A) getting someone to buy the equivalent of a cage-free chicken instead of a caged chicken vs (B) getting someone to buy a farmed salmon instead of a farmed carp. Is it right that I’d now need to incorporate (estimates for) the following additional information?
To understand the welfare impact on the animals in question
Lifespans of the animals[1]
Where exactly on their respective welfare ranges they are, on average (in the situations I’m considering)[2]
The other stuff
Indirect effects
E.g. how (many) other animals are affected by the farming processes — feed (insects/fish), how many die in the farming process, etc.
Costs of the interventions
(In particular, I worry a bit that people might not be tracking 1a and 1b — you seem to worry about this, too, given the sections on things like “so you’re saying that one person =~ three chickens?” — and I’d like to make sure that I actually understand correctly (and that others do, too).)
Broiler chickens live for 5-7 weeks, apparently. Farmed carp apparently live for around a year, and farmed salmon live for around 1-3 years. (These numbers are from quick Google searches —definitely don’t trust them.)
A highly technical diagram is below. Note that the diagram represents the ranges as if they’re all symmetric — as if each animal can experience as much bad as good — whereas that isn’t necessarily true. The welfare impact of choice (A) and (B) is the highlighted interval (assuming completely made-up numbers), multiplying by the lifespans of the animals, and adjusting for indirect effects.
Given the lifespans of the animals in question, switching to salmon seems harmful ((even) without accounting for indirect effects or costs).
Fantastic questions, Lizka! And these images are great. I need to get much better at (literally) illustrating my thinking. I very much appreciate your taking the time!
Here are some replies:
Replacing an M with an N. This is a great observation. Of course, there may not be many real-life cases with the structure you’re describing. However, one possibility is in animal research. Many people think that you ought to use “simpler” animals over “more complex” animals for research purposes—e.g., you ought to experiment on fruit flies over pigs. Suppose that fruit flies have smaller welfare ranges than pigs and that both have symmetrical welfare ranges. Then, if you’re going to do awful things to one or the other, such that each would be at the bottom of their respective welfare range, then it would follow that it’s better to experiment on fruit flies.
Assessing the neutral point. You’re right that this is important. It’s also really hard. However, we’re trying to tackle this problem now. Our strategy is multi-pronged, identifying various lines of evidence that might be relevant. For instance, we’re looking at the Welfare Footprint Data and trying to figure out what it might imply about whether layer hens have net negative lives. We’re looking at when vets recommend euthanasia for dogs and cats and applying those standards to farmed animals. We’re looking at tradeoff thought experiments and some of the survey data they’ve generated. And so on. Early days, but we hope to have something on the Forum about this over the summer.
Symmetry vs. asymmetry. This is another hard problem. In brief, though, we take symmetry to be the default simply because of our uncertainty. Ultimately, it’s a really hard empirical question that requires time we didn’t have. (Anyone want to fund more work on this!?) As we say in the post, though, it’s a relatively minor issue compared to lots of others. Some people probably think that we’re orders of magnitude off in our estimates, whereas symmetry vs. asymmetry will make, at most, a 2x difference to the amount of welfare at stake. That isn’t nothing, but it probably won’t swing the analysis.
The “caged vs. cage-free chicken / carp vs. salmon” examples. This is a great question. We’ve done a lot on this, though none of it’s publicly available yet. Basically, though, you’re correct about the information you’d want. Of course, as your note indicates, we don’t care about natural lifespan; we care about time to slaughter. And while it’s very difficult to know where an animal is in its welfare range, we don’t think it’s in principle inestimable. Basically, if you think that caged hens are living about the worst life a chicken can live, you say that they’re at the bottom end of their welfare range. And if you think cage-free hens have net negative lives, but they’re only about half as badly off as they could be, then can infer that you’re getting a 50% gain relative to chickens’ negative welfare range in the switch from caged to cage-free. And so on. This is all imperfect, but at least it provides a coherent methodology for making these assessments. Moreover, it’s a methodology that forces us to be explicit about disagreements re: the neutral point and the relative welfare levels of animals in different systems, which I regard as a good thing.
(I could believe octopuses beat carps, because octopuses seem unusually cognitively sophisticated among animals.)
I’d guess the main explanation for this (at least sentience-adjusted, if that’s what’s meant here), which may have biased your results against salmons and carps, is that you used the prior probability for crab sentience (43% mean, 31% median from table 3 in the doc) as the prior probability for salmon and carp sentience, and your posterior probabilities of sentience are generally very similar to the priors (compare tables 3 and 4 in the doc). Honeybees, fruit flies, crabs, crayfish, salmons and carps all ended up with similar sentience probabilities, but I’d assign higher probabilities to salmons and carps than to the others. You estimated octopuses to be about 2x as likely to be sentient as salmons and carps, according to both your priors and posteriors, with means and medians roughly between 73% and 78% for octopuses. On the other hand, your sentience-conditioned welfare ranges didn’t differ too much between the fish, octopuses and bees. It’s worth pointing out that Luke Muehlhauser had signficantly higher probabilities for rainbow trouts (70%, in the salmonid family like salmons) than Gazami crabs (20%) and fruit flies (25%), and you could use his prior for rainbow trouts for salmons and carps instead (or something in between). That being said, his probabilities were generated in different ways from yours, so that might introduce other biases. You could instead use your prior for octopuses (or something in between). Or, most consistent with your methodology, would be to have the authors of the original estimates for RP just estimate these probabilities directly, with or without the data you gathered for salmons and carps. Any of these would be relatively small fixes.
As an aside, should we interpret this sentience probability work as not primarily refining your old estimates (since the posteriors and priors are very similar), but as adding other species and further modelling your uncertainty?
There may be some other smaller potential sources of bias that contributed here, but I don’t expect them to have been that important:
I’m guessing salmon and carp (and apparently zebrafish, which seem to often have been used when direct evidence wasn’t available, maybe more for carp) are less well-studied than bees, so your conservative assumptions of assigning 0 to “unknown” for both probabilities of sentience and welfare ranges conditional on sentience may count more against them. For example, there were some studies found for “cognitive sophistication” for honeybees but not for salmons or carps, and more found for “mood state behaviors” for honeybees than salmons and carps in your new Sentience Table. For your Welfare Range Table, bees had fewer “unknowns” for cognitive proxies than salmons and carps, but more for hedonic proxies and representational proxies.
One possible quick-ish fix would be to use a prior for the presence/absence of proxies across animals based on the ones for which there are studies (possibly just those you collected evidence for), although this may worsen other biases, like publication bias, and it requires you to decide how to weigh different animal species (but uniformly across those you collected evidence for is one way, although somewhat arbitrary).
Another quick-ish fix could be to make more assumptions between species you gathered evidence for, e.g. if a fruit fly has some capacity, I’d expect fish to, as well, and if some mammal is missing some capacity, I’d expect salmon and carp to not have it either. This may be too strong, but you did use the crab sentience prior for the fish.
Longer fixes could use more sophisticated missing data methods.
You may have underestimated salmon and carp neuron counts around 100x.
Also, among the proxies you’ve used, I’d be inclined to give almost all of my weight to a handful of hedonic proxies, namely panic-like behavior, hyperalgesia, PTSD-like behavior, prioritizes pain response in relevant context and motivational trade-off (a cognitive proxy) as indicating the extremes of welfare conditional on sentience, and roughly in that order by weight. The first three all came up “unknown” due to no studies for bees, but there were a few studies suggesting their presence (and none negative) for the fish. Giving almost all of your weight to these proxies would favor the fish over bees. That being said, I wouldn’t be that surprised to find out that bees display those behaviors, too, because I also think bees are very impressive and behaviorally complex.
I might use joy-like behavior and play behavior for the other end of the welfare range, but I expect them to be overshadowed by the intense suffering indicators above, and I don’t expect them to differ too much across the species. There was evidence of play behavior in all three, but only evidence for joy-like behavior in carps.
The next proxies that could make much difference that I think could matter on some models (although I don’t assign them much weight) would be neuron counts and the number of just-noticeable differences, and neuron counts would also favor the fish.
Thanks for all this, Michael. Lots to say here, but I think the key point is that we don’t place much weight on these particular numbers and, as you well know and have capably demonstrated, we could get different numbers (and ordinal rankings) with various small changes to the methodology. The main point to keep in mind (which I say not for your sake, but for others, as I know you realize this) is that we’d probably get even smaller differences between welfare ranges with many of those changes. One of the main reasons we get large differences between humans and many invertebrates is because of the sheer number of proxies and the focus on cognitive proxies. There’s an argument to be given for that move, but it doesn’t matter here. The point is just that if we were to focus on the hedonic proxies you mention, there would be smaller differences—and it would be more plausible that those would be narrowed further by further research.
If I had more time, I would love to build even more models to aggregate various sets of proxies. But only so many hours in the day!
Hi Bob and RP team,
I’ve been working on a comparative analysis of the knock-on effects of bivalve aquaculture versus crop cultivation, to try to provide a more definitive answer to how eating oysters/mussels compares morally to eating plants. I was hoping I could describe how I’d currently apply the RP team’s welfare range estimates, and would welcome your feedback and/or suggestions. Our dialogue could prove useful for others seeking to incorporate these estimates into their own projects.
For bivalve aquaculture, the knock-on moral patients include (but are not limited to) zooplankton, crustaceans, and fish. Crop cultivation affects some small mammals, birds, and amphibians, though its effect on insect suffering is likely to dominate.
RP’s invertebrate sentience estimates give a <1% probability of zooplankton or plant sentience, so we can ignore them for simplicity (with apologies to Brian Tomasik). The sea hare is the organism most similar to the bivalve for which sentience estimates are given, and it is estimated that a sea hare is less likely to be sentient than an individual insect. Although the sign of crop cultivation’s impact on insect suffering is unclear, the magnitude seems likely to dominate the effect of bivalve aquaculture on the bivalves themselves, so we can ignore them too for simplicity.
The next steps might be:
Calculate welfare ranges:
For bivalve aquaculture, use carp, salmon, crayfish, shrimp, and crabs to calculate a welfare range for the effect of bivalve aquaculture on marine populations.
Use chickens as a model species to calculate a welfare range for the effect of crop cultivation on vertebrate populations.
For the effect of crop cultivation on insect suffering, I might just toss this problem on to future researchers. I’m only doing this as a side project, and given the sheer complexity of the considerations at play, I’m worried I might publish something which inadvertently increases insect suffering instead of decreasing it.
For several moral views (negative utilitarianism, symmetric utilitarianism) and several perspectives of the value of a typical wild animal’s life (net negative, net neutral, net positive), extract relevant conclusions. (e.g. if bivalve aquaculture is robustly shown to increase marine populations, given Brian’s arguments that crop cultivation likely reduces vertebrate populations, a negative utilitarian who views wild animal lives as net negative may want to oppose bivalve consumption.)
(Of course, I’d have to mention longtermist considerations. The effect of norms surrounding animal consumption on moral circle expansion could be crucial. So could the effect of these consumption practices on climate change or on food security.)
Thanks for your comment, Ariel, and sorry for the slow reply! What you’ve described sounds great as far as it goes. However, my basic view here—which I offer with sincere appreciation for the project you’re describing and a genuine desire to see it completed—is that the uncertainties are so far-reaching that, while we can get clearer about the conditions under which, say, a negative utilitarian will condemn bivalve consumption, we basically have no idea which condition we’re in. So, I think that the most valuable thing right now would be to write up specific empirical research questions and value-aligned ways of operationalizing the key concepts. Then, we should be hunting for graduate students and early-career researchers who might be willing to do the empirical work in exchange for relatively small amounts of funding. (Many academics are cheap dates.) From my perspective, EA has gone just about as far as it can already on these kinds of questions without more substantive collaborations with entomologists, aquatic biologists, ecologists, and so on.
All that said, I’ll stress that I completely agree with you about the importance of getting answers here! I just think we’re at the point where we can’t make much more progress toward them from the armchair.
Question about uncertainty modeling (tagging @Laura Duffy here since she might be the best person to answer it):
How do you think about the different models of welfare capacity that were averaged together to make the mixture model? Is your assumption that one of these models is really the true correct model in all species (and you don’t yet know which one it is), or that the different constituent models might each be more or less true for describing the welfare capacity for each individual species?
My context for asking this is in thinking about quantifying the uncertainty for a function that depends on the welfare ranges of two different species (e.g. y = f(welfare range of shrimp, welfare range of pigs)). It’s tempting to just treat the welfare ranges of shrimp and pigs as independent variables and to then sample each of them from their respective mixture model distribution. But if we think there’s one true model and the mixture model is just reflecting uncertainty as to what that is, the welfare ranges of shrimp and pigs should be treated as correlated variables. One might then obtain an estimate of the uncertainty in y by generating samples as follows:
Randomly pick one of the 9 models in the mixture model as the true model
Sample the welfare range of both shrimp and pigs from their distributions for the selected constituent model
Compute y = f(welfare range of shrimp, welfare range of pigs)
Repeat steps 1-3 until the desired # of samples is obtained
I could also imagine computing the covariance of the different species’ welfare ranges and directly generating samples as correlated random variables.
Thanks a bunch for your question, Matt. I can speak to the philosophical side of this; Laura has some practical comments below. I do think you’re right that—and in fact our team discussed the possibility that—we ought to be treating the welfare range estimates as correlated variables. However, we weren’t totally sure that that’s the best way forward, as it may treat the models with more deference than makes sense.
Here’s the rough thought. We need to distinguish between (a) philosophical theories about the relationship between the proxies and welfare ranges and (b) models that attempt to express the relationship between proxies and welfare range estimates. We assume that there’s some correct theory about the relationship between the proxies and welfare ranges, but while there might be a best model for expressing the relationship between proxies and welfare range estimates, we definitely don’t assume that we’ve found it. In part, this is because of ordinary points about uncertainty. Additionally, it’s because the philosophical theories underdetermine the models: lots of models are compatible with any given philosophical theory; so, we just had to choose representative possibilities. (The 1-point-per-proxy and aggregation-by-addition approaches, for instance, are basically justified by appeal to simplicity and ignorance. But, of course, the philosophical theory behind them is compatible with many other scoring and aggregation methods.) So, there’s a worry that if we set things up the way you’re describing, we’re treating the models as though they were the philosophical theories, whereas it might make more sense not to do that and then make other adjustments for practical purposes in specific decision contexts if we’re worried about this.
Laura’s practical notes on this:
A change like the one you’re suggesting would likely decrease the variance in the estimates of f(), since if you assume the welfare ranges are independent variables, you’d get samples where the undiluted experiences model is dominating the welfare range for, say, shrimp, and the neuron count model is dominating the welfare range for pigs. I suggest a quick practical way of dealing with this would be to cut off values of f() below the 2.5th percentile and 97.5th percentile.
Or, even better, I suggest sorting the welfare ranges from least to greatest, then using pairs of the ith-indexed welfare ranges for the ith estimate of f(). Since each welfare model is given the same weight, I predict this’ll most accurately match up welfare range values from the same welfare model. (e.g. the first 11% will be neuron count welfare ranges, etc.)
Ultimately, however, given all the uncertainty in whether our models are accurately tracking reality, it might not be advisable to reduce the variance as such.
Thanks, this is great information! The concern you raised regarding distinguishing between philosophical theories and models makes a lot of sense. With that said, I don’t currently feel super satisfied with the practical steps you suggested.
On the first note, the impact of the correlation depends on the structure of f. Suppose I’m trying to estimate the total harms of eating chicken/pork, so we have something like y=c1∗welfare range of pigs+c2∗welfare range of chickens. In this case, treating the welfare ranges of chickens and pigs as correlated will increase the variance of y. On the flip side, if we’re trying to estimate the welfare impact of switching from eating chicken to eating pork, we have something like y=c3∗welfare range of chickens−c4∗welfare range of pigs. In that case, treating the welfare ranges of pigs and chickens as correlated will decrease the variance of y. Trying to address this in an ad-hoc manner seems like it’s pretty challenging.
On the second note, I think that’s basically treating the welfare capacities of e.g. pigs and chickens as perfectly correlated with one another. That seems extreme to me, since I think a substantial portion of the uncertainty in the welfare rages is coming from uncertainty as to which traits each species has, not which philosophical theory of welfare is correct.
I come away still thinking that the procedure I suggested seems like the most workable of the approaches mentioned so far. To put a little more rigor to things, here are some examples of plotting the welfare range estimates of chickens and pigs against one another with the different methods (uncorrelated sampling from the respective mixture distributions, sampling from the ordered distributions, and pair-wise sampling from the constituent models). In addition, there are some plots showing the impact of the different sampling methods on some toy analyses of the welfare impact of eating chicken/pork and the impact of switching from eating chicken to eating pork (note that the actual numbers are not intended to be very representative). You can see that the trimming approach only make sense in the second case, and that the paired sampling from constituent models approach produces distributions in between those for the uncorrelated case and those for the ordered case.
Note that when using the pair-wise sampling from constituent models approach, pigs and chickens are more strongly correlated with one another than many other pairs of species are. Here is what the correlation between chickens and shrimp looks like, for example:
Hey, thanks for this detailed reply!
When I said “practical”, I more meant “simple things that people can do without needing to download and work directly with the code for the welfare ranges.” In this sense, I don’t entirely agree that your solution is the most workable of them (assuming independence probably would be). But I agree—pairwise sampling is the best method if you have the access and ability to manipulate the code! (I also think that the perfect correlation you graphed makes the second suggestion probably worse than just assuming perfect independence, so thanks!)
Yeah that makes complete sense, it was a pain to get the pairwise sampling working.
Love this type of research, thank you very much for doing it!
I’m confused about the following statement:
Is this a species-specific suspicion? Or does a lower amount of (high-quality) research on a species generally reduce your welfare range estimate?
On average I’d have expected the welfare range estimate to stay the same with increasing evidence, but the level of certainty about the estimate to increase.
If you have reason to believe that the existing research is systematically biased in a way that would lead to higher welfare range estimates with more research, do you account for this bias in your estimates?
Great question, Tobias. Yes, less research on a species generally reduces our welfare range estimate. I agree with you that it would be better, in some sense, to have our confidence increase in a fixed estimate rather than having the estimates themselves vary. However, we couldn’t see how to do that without invoking either our priors (which we don’t trust) or some other arbitrary starting point (e.g., neuron counts, which we don’t trust either). In any case, that’s why we frame the estimates as placeholders and give our overall judgments separately: vertebrates at 0.1 or better, the vertebrates themselves within 2x of one another, and the invertebrates within 2 OOMs of the vertebrates.
This is really valuable work, and I look forward to seeing the discussion that it generates and to digging into it more closely myself. I did have one immediate question about the neuron count model specifically, though I recognize that it’s a a small contributor to the overall weights. I’d be curious to understand how you arrived at 13 million neurons as your estimate for salmon. The reference in the spreadsheet is:
I don’t easily see how that translates to 13 million neurons. When I previously looked at this issue myself, I came away thinking it was possible that salmon had substantially more neurons than you’re estimating.
Thanks, MHR. Quick reply to say: Good question, but I don’t know the answer offhand, as I didn’t come up with that number myself. Many different people helped with the literature reviews. I’ll get in touch with the relevant person and get back to you.
Sorry for the delay, MHR! It took a bit to get to the bottom of this. In any case, the short version is that the 8-13M neuron count for both salmon and carp should be read as the lowest reasonable estimate, not our best guess. We got the number from the zebrafish literature—specifically, a study by Hinsch & Zupanc (2007) (cited in the table) who reported that the total number of brain cells for adult zebrafish varied between 8 and 13 million. In the notes associated with the Welfare Range Table, we had a caveat that neuron counts are very hard to come by in fish and, in any case, only represent a snapshot in time, because the teleost brain is constantly growing. Moreover, no one has done total neuron count estimates for salmon or carp, whereas zebrafish are often used as a model species and are well-studied; so, we simply used those values as a placeholder. Granted, then, the 8-13M number may well be an underestimate due to the size differences between zebrafish and salmon, and we do see the appeal of using Invincible Wellbeing’s curve fits to come up with a higher number. However, we tried to stick as close to the empirical literature as possible. And truth be told, because neuron counts are just one of several models we include, using a higher number wouldn’t make a major difference to our welfare range estimates for salmon or carp.
The upshot is that is one of many cases where our methodology is more conservative than many EAs have been when doing related projects (e.g., we were more inclined to default to “unknown,” we used lower-bound placeholder values in some cases, etc.). Advantages and disadvantages!
Thanks Bob, that makes sense!
Just to see the magnitude of the change, I tried rerunning the model with a neuron count estimate of 100 million for salmon. That led to salmon’s 50th-percentile estimate increasing by 0.001 and 95th-percentile estimate increasing by 0.002. So you’re right that it’s not really a noticeable impact.
Hello to all,
Have you contacted the Integrated Information Theory group about this project? In my (dualistic naturalist) viewpoint their work is the most advanced in the area of consciece detection.
https://www.amazon.com/Sizing-Up-Consciousness-Objective-Experience/dp/0198728441
Of course, conscience is absolutely noumenal and the best part of their work is focused in the case where self reported conscience experience is possible [humans], but they tried to extrapolate into mathematical models of application to any material system.
The last I read about Integrated Information Theory was Scott Aaronsson’s criticism of it. Has his arguments been addressed, because I found it very compelling?
Regarding the neurological part (the conscience detector based in brain information) that is described in “Sizin Up consciuosness” I think they are mostly rigth. The IIT mathematical model is beyond my understanding, and the Aronsson criticism also. But given my naturalistic dualist vision of conscience, unfortunately only an axiomatic and extrapolative way to consciousness measurement is possible.
Good suggestion, Arturo. We haven’t reached out, but it’s certainly worth having a conversation.
Hey, I thought I’d make a Bayesian adjustment to the results of this post. To do this, I am basically ignoring all nuance. But I thought that it might still be interesting. You can see it here: https://nunosempere.com/blog/2023/02/19/bayesian-adjustment-to-rethink-priorities-welfare-range-estimates/
May be worth also updating on https://forum.effectivealtruism.org/posts/WfeWN2X4k8w8nTeaS/theories-of-welfare-and-welfare-range-estimates. Basically, you can roughly decompose the comparison as (currently achievable) peak human flourishing to the worst (currently achievable) human suffering (torture), and then that to the worst (currently achievable) chicken suffering. You could also rewrite your prior to be over each ratio (as well as the overall ratio), and update the joint distribution.
Seems like a good idea, but also a fair bit of work, so I’d rather wait until RP releases their value ratios over actually existing humans and animals, and update on those. But if you want to do that, my code is open source.
Thanks for all this, Nuno. The upshot of Jason’s post on what’s wrong with the “holistic” approach to moral weight assignments, my post about theories of welfare, and my post about the appropriate response to animal-friendly results is something like this: you should basically ignore your priors re: animals’ welfare ranges as they’re probably (a) not really about welfare ranges, (b) uncalibrated, and (c) objectionably biased.
You can see the posts above for material that’s relevant to (b) and (c), but as evidence for (a), notice that your discussion of your prior isn’t about the possible intensities of chickens’ valenced experiences, but about how much you care about those experiences. I’m not criticizing you personally for this; it happens all the time. In EA, the moral weight of X relative to Y is often understood as an all-things-considered assessment of the relative importance of X relative to Y. I don’t think people hear “relative importance” as “how valuable X is relative to Y conditional on a particular theory of value,” which is still more than we offered, but is in the right ballpark. Instead, they hear it as something like “how valuable X is relative to Y,” “the strength of my moral reasons to prioritize X in real-world situations relative to Y,” and “the strength of my concern for X relative to Y” all rolled into one. But if that’s what your prior’s about, then it isn’t particularly relevant to your prior about welfare-ranges-conditional-on-hedonism specifically.
Finally, note that if you do accept that your priors are vulnerable to these kinds of problems, then you either have to abandon or defend them. Otherwise, you don’t have any response to the person who uses the same strategy to explain why they assign very low value to other humans, even if the face of evidence that these humans matter just as much as they do.
I agree with a), and mention this somewhat prominently in the post, so that kind of sours my reaction to the rest of your comment, as it feels like you are answering to something I didn’t say:
and then later:
In any case, thanks for the references re: b) and c)
Re: b), it would in fact surprise me if my prior was uncalibrated. I’d also say that I am fairly familiar with forecasting distributions. My sense is that if you wanted to make the argument that my estimates are uncalibrated, you can, but I’d expect it’d be tricky.
Re: c), this is if you take a moral realist stance. If you take a moral relativist stance, or if I am just trying to describe that I do value, you have surprisingly little surface to object to.
Yes, that is part of the downside of the moral relativist position. On the other hand, if you take a moral realist position my strong impression is that you still can’t convince e.g., a white supremacist, or an egoist, that all lives are equal, so you still share that downside. I realize that this is a longer argument though.
Anyways, I didn’t want to leave your comment unanswered but I will choose to end this conversation here (though feel free to reply on your end).
I am actually a bit confused about why you bothered to answer. Like, no answer was fine, an answer saying that you hadn’t read it but pointing to resources and pitfalls you’d expect me to fall into would have been welcome, but your answer is just weird to me.
I skimmed the piece on axiological asymmetries that you linked and am quite puzzled that you seem to start with the assumption of symmetry and look for evidence against it. I would expect asymmetry to be the more intuitive, therefore default, position. As the piece says
I would expect that a difference in magnitude between the best pleasure and worst possible is the most obvious explanation, but the piece concludes that these judgments are “far more plausibly explained by various cognitive biases”.
As far as I can tell this would suggest that either:
Someone who has recently experienced or is currently experiencing intense suffering (and therefore has a better understanding of the stakes) would be more willing to take the kind of roulette gamble described in the piece. This seems unlikely.
People’s assessments of hedonic states are deeply unreliable even if they have recent experience of the states in question. I don’t like this much because it means we have to fall back on physiological evidence for human pleasure/suffering, which, as shown by the mayonnaise example, can’t give us the full picture.
On a slightly separate note, I played around with the BOTEC to check the claim that assuming symmetry doesn’t change the numbers much and I was convinced. The extreme suffering-focused assumption (where perfect health is merely neutral) resulted in double the welfare gain of the symmetric assumption (when the increase in welfare as a percentage of the animals’ negative welfare range is held constant).
My main question on this last point is: why use “percentage of the animals’ negative welfare range” when “percentage of the animals’ total welfare range” seems more relevant and would not vary at all across different (a)symmetry assumptions?
Thanks for reading that Stan! Good question. I realize now that my report and the post together are a bit confusing because there are two types of symmetry at issue that seem to get blended together. I could have been clearer about this in the report. Sorry about that!
First, the post mentions the concept of welfare ranges being *symmetrical around the neutral point*. Assuming this means assuming that the best realizable welfare state is exactly as good as the worst realizable welfare state. That is assumed for simplicity, though the subsequent part of the post is meant to show that that assumption matters less than one might think.
Second, in my linked report, I focus on the concept of *axiological symmetries* which concern whether every fundamental good-making feature of a life has a corresponding fundamental bad-making feature. If we assume this and, for instance, believe that knowledge is a fundamental good-making feature, then we’d have to think that there is a corresponding fundamental bad-making feature (unjustified false belief, perhaps).
These concepts are closely related, as the existence of axiological asymmetries may provide reason to think that welfare is not symmetrical around the neutral point and vice versa. Nevertheless, and this is the crucial point, it could work out that there is complete axiological symmetry, yet welfare ranges are still not symmetrical around the neutral point. This could be because some beings are constituted in such a way that, at any moment in time, they can realize a greater quantity of fundamental bad-making features than fundamental good-making features (or vice versa).
Axiological asymmetries seem prima facie ad hoc. Without some argument for specific axiological asymmetries and without working out their axiological implications, I do think axiological symmetry should be the default assumption. There’s some nice discussion of this kind of issue in the Teresa Bruno-Niño paper cited in the report. In fact, it seems to me that both (what she calls) continuity and unity are theoretical virtues.
https://www.pdcnet.org/msp/content/msp_2022_0999_11_25_29
Now, even granting what I just wrote about axiological symmetry, perhaps the default assumption should be that welfare is not symmetrical around the neutral point for the reasons you gave. That seems totally reasonable! I personally don’t have strong views on this. Though, I do think there is a good evolutionary debunking argument to give for why animals (including humans) might be more motivated to avoid pain than accrue pleasure and why humans might be disposed to be risk-adverse in the roulette wheel example. I’m genuinely not sure how much these considerations suggest that the default is that welfare is not symmetrical around the neutral point.
Whether welfare is symmetrical around the neutral point is largely an empirical question, though. I wouldn’t be surprised if we discover that welfare is not symmetrical around the neutral point. That’s a very realistic possibility. Though still a viable possibility, I would be somewhat surprised if we discover any axiological asymmetries.
Thanks for your questions, Stan. Travis wrote the piece on axiological asymmetries and he can best respond on that front. FWIW, I’ll just say that I’m not convinced that there’s a difference of an order of magnitude between the best pleasure and the worst pain—or any difference at all—insofar as we’re focused on intensity per se. I’m inclined to think it’s just really hard to say and so I take symmetry as the default position. For all that, I’m open to the possibility that pleasures and pains of the same intensity have different impacts on welfare, perhaps because some sort of desire satisfaction theory of welfare is true, we’re risk-averse creatures, and we more strongly dislike signs of low fitness than the alternative. Point is: there may be other ways of accommodating your intuition than giving up the symmetry assumption.
To your main question, we distinguish the negative and positive portions of the welfare range because we want to sharply distinguish cases where the interventions flips the life from net negative to net positive. Imagine a case where an animal has a symmetrical welfare range and an intervention moves the animal either 60% of their negative welfare range or 60% of their total welfare range. In the former case, they’re still net negative; in the latter case, they now net positive. If you’re a totalist, that really matters: the “logic of the larder” argument doesn’t go through even post-intervention in the former case, whereas it does go through in the latter.
If these estimates will be used as multipliers for a hedonistic/suffering scale based on WFP’s pain intensity levels (as was done here recently), then the undiluted experience model might contradict the definition of disabling pain, and probably contradicts the definition of excruciating pain, because these can’t be ignored and they take up most or ~all of an animal’s attention, by definition. Furthermore, I think what you’d want to do instead anyway, if using WFP’s pain scale, is just use an equality model and assess more carefully where an animal is on WFP’s pain scale, taking into account potential distractions. Dilution wouldn’t change the badness of a given level of suffering (affective component of physical and psychological pain, which is what I think WFP’s scale is supposed to capture); it would reduce the level of suffering, and so move the experience towards the milder end of WFP’s pain scale. I’m confident that excruciating pain in humans is never or rarely significantly diluted (just through distraction by things other than similarly intense pain), and I doubt that disabling pain is significantly diluted, too.
WFP also has a post on the role of attention here, and, related to this, they wrote (bold mine):
I also worry about most of the qualitative/non-quantitative models basically double-counting animals’ responses, if used as multipliers for WFP’s pain scale. Some animals may just not be capable of experiencing excruciating pain at all, but that should just be captured in the probability that they are in fact experiencing excruciating (or disabling) pain under given conditions, not as a multiplier for the badness of excruciating pain, except possibly for reasons that really do stack on top of excruciating pain. Maybe the number of JNDs or conscious subsystems stack on top, which are reflected in the quantitative models, but few if any of the qualitative indicators seem like they should stack on top.
I would personally shift the probabilities assigned to the qualitative models to the equality model, when you want to use the welfare ranges estimates as multipliers for a WFP pain intensity scale.[1]
But then, this also makes a uniform prior across the original subset of models look weird/suspicious.
Should we instead use a uniform prior over the new subset of models for multiplying WFP scales? Your credences in the models shouldn’t be sensitive to something like this.
Do the estimates for black soldier flies primarily reflect adults? If we wanted to use an estimate for BSF larvae or mealworms, should we use the BSF estimates, the silkworm estimates (which presumably reflect the larvae, or else you’d call them silkmoths?), something in-between (an average?) or something else?
Great question, Michael. It’s probably fine to use the silkworm estimates for this purpose.
@Laura Duffy @Bob Fischer
A question about your methodology : If I understand correctly, your placeholders are probability-of-sentience-adjusted, but your key takeaways are not (since they are “conditional on sentience”).
Why having adjusted for sentience in your placeholders but not in your key takeaways ?
Good question, Keyvan. This was pragmatic: our main goal was to make a point about welfare ranges, not p(sentience), so we wanted to discuss things that way in the key takeaways. But knowing people would want a single number per species to play with in models, we figured we should give people placeholders that are already adjusted.
Thanks for your reply Bob :)
Hi Bob,
Could you clarify how you aggregated the welfare range distributions from the 8 models you considered? I understand you gave the same weight to all of these 8 models, but I did not find the aggregation method here.
I would obtain the final cumulative distribution function (CDF) of the welfare range aggregating the CDFs of the 8 models with the geometric mean of odds, as Epoch did to aggregate judgement-based AI timelines. I think Jaime Sevilla would suggest using the mean in this case:
However, I would say the 8 welfare range models are closer to the “all-considered views of experts” than to “models with mutually exclusive assumptions”. In addition:
The mean ignores information from extremely low predictions, and overweights outliers.
The weighted/unweighted geometric mean of odds (and also the geometric mean) performed better than the weighted/unweighted mean on Metaculus’ questions.
Samotsvety aggregated predictions differing a lot between them from 7 forecasters[1] using the geometric mean after removing the lowest and highest values (and the geometric mean is more similar to the geometric mean of odds than to the mean).
For the question “What is the unconditional probability of London being hit with a nuclear weapon in October?”, the 7 forecasts were 0.01, 0.00056, 0.001251, 10^-8, 0.000144, 0.0012, and 0.001. The largest of these is 1 M (= 0.01/10^-8) times the smallest.
Hi Bob,
Do you have any thoughts on the feasibility of extending your framework to estimate the welfare range of non-biological systems, namely advanced AI models like GPT-4? It naively looks like some of the models you considered to estimate the welfare ranges could apply to AI systems. I wish discussions about artificial sentience moved from “is this AI system sentient” to “what is the expected welfare range of this AI system”...
Short version: strongly agree with you about the importance of shifting the conversation from sentience to welfare ranges, but I think that the issue is basically intractable given hedonism at this juncture, as we have no reason to think that any of the states that could be mental states in AI systems are type identical to any of the states in biological organisms. It isn’t intractable given other theories of welfare, though, and depending on your views about what moral weights represent, a “moral weight” for AI systems might still be available. However, we’d need a different methodology for that than the one we outline here.
Thanks and congratulations to the RP team for your work on this. This is incredibly thorough and useful!
Having looked at the whole Moral Weight Project sequence in some detail, I have some uncertainties around the following question/objection that you list above:
“Your literature review didn’t turn up many negative results. However, there are lots of proxies such that it’s implausible that many animals have them. So, your welfare range estimates are probably high.”
In your response you write that this is a good objection.
However, as I understand it, whenever proxies were unknown, you assumed these to be zero (i.e. not present). For instance, in your methodology writeup, I read: “Assigning proxies labeled “Unknown” zero probability of being present is certainly leading to underestimates of the welfare ranges and probabilities of sentience.”
Somehow I cannot square these two statements. Can you solve that seeming contradiction for me?
Thanks for your question, Moritz. We distinguish between negative results and unknowns: the former are those where there’s evidence of the absence of a trait; the latter are those where there’s no evidence. We penalized species where there was evidence of the absence of a trait; we gave zero when there was no evidence. So, not having many negative results does produce higher welfare range estimates (or, if you prefer, it just reduces the gaps between the welfare range estimates).
Thanks for the explanation Bob. That absolutely makes sense! I was somehow assuming that negative results would count as zeros as well.
Thanks for the writeup. Not an area I know much about. Interested to hear what you think the priorities are for further research in this area.
I liked the common questions & responses section—very helpful for someone like me who is new to this topic.
What surprised me—perhaps it shouldn’t have done—is that you think it’s plausible that some animals have a welfare higher than humans…
Appreciate the comment!
Re: further research priorities, there are “within paradigm” priorities and “beyond paradigm” priorities. As for the former, I think the most useful thing would be a more thorough investigation of theories of valence, as I think we could significantly improve the list of proxies and our scoring / aggregation methods if we had a better sense of which theories are most promising. As for the latter, my guess is that the most useful thing would be figuring out whether, given the hierarchicalism, there are any limits at all on discounting animal welfare simply because it belongs to animals. My guess is “No,” which is one of the problems with hierarchicalism, but it would be good to think this through more carefully.
Re: some animals having larger welfare ranges than humans, we don’t want to rule out this possibility, but we don’t actually believe it. And it’s worth stressing, as we stress here, that this possibility doesn’t have any radical implications on its own. It’s when you combine it with other moral assumptions that you get those radical implications.
so although I’m not worth only 3 chickens, the key takeaway is that I’m worth around 50 chickens, is that the deal?
Thanks for your question, Sabs. Short answer: if (a) you think of your value purely in terms of the amount of welfare you can generate, (b) you think about welfare in terms of the intensities of pleasures and pains, (c) you’re fine with treating pleasures and pains symmetrically and aggregating them accordingly, and (d) you ignore indirect effects of benefitting humans vs. nonhumans, then you’re right about the key takeaway. Of course, you might not want to make those assumptions! So it’s really important to separate what should, in my view, be a fairly plausible empirical hypothesis—that the intensities of many animals’ pleasures and pains are pretty similar to the intensities of humans’ pleasures and pains—from all the philosophical assumptions that allow us to move from that fairly plausible empirical hypothesis to a highly controversial philosophical conclusion about how much you matter.
I think you should put this in big letters on the graph and that Peter should write it in his tweet thread. Currently this is going to get misunderstood and since you can predict this, I suggest it’s your responsibility to avoid it.
That graph and all tables need to be hard to share without the provisos you’ve given here.
Added clarification to Twitter thread—thanks
Thanks, Nathan. This is a good point.
In particular Edouard of Our World In Data said that they really care about their graphs being understood well and that when they see a graph being mistaken or with a bad legend they change it.
I think this is the right approach to ensure that graphs are shared with context.
I’ve redone the summary image, Nathan. Thanks again for recommending this.
Really appreciate this thread ^. I’m impressed that something misleading got pointed out by Nathan/Sabs and then was immediately improved.
Minor comment: I’d maybe re-title the image to something like “For each species, an estimate of their welfare range” or “Estimated welfare ranges per year of life of different species” ? I find “Placeholder Welfare Range Estimates (Life Years)” somewhat hard to parse. Although having written this, I’m not sure that my suggestions are better.
(And thanks for writing the post and working on this project!)
Good of you to say, Lizka. Thanks.
Re: the title of the image, that’s a helpful suggestion. I’m genuinely unsure what’s best. The most accurate title would be something like, “Welfare range estimates by species for welfare-to-DALYs-averted conversions,” but that doesn’t win any awards for accessibility.
It’s also per period of time, and humans live much longer than chickens.
ok. Well I don’t actually care about how much I think I matter (obviously the answer is “an enormous amount”), what I really care about is how much you think I matter, or how much the median EA thinks I matter. How many of these four assumptions you listed do you actually believe? If you do believe some of them, then presumably in your eyes I am worth some relatively low number of chickens, right? What happens if my neck is on the block and you have the choice between sacrificing me or wringing the necks of 100 chickens? That’s the really important key question here.
Hi Sabs. We can discuss this a bit in a comment thread, but the issues here are complicated. If you’d like to have a conversation, I’m happy to chat. Please DM me for a link to my calendar.
Brief replies to your questions:
I think you matter an enormous amount too. I am not saying this facetiously. It’s probably the thing I believe most deeply.
I don’t know how much the median EA thinks you matter.
I’m unsure about all four assumptions. However, I’m also unsure about their practical importance. You might not be comfortable with the results of any cross-species cost-effectiveness analysis.
If it’s you or a hundred chickens, I’d save you. I’d also save my children over a hundred (human) strangers. I don’t think this means that my children realize more welfare than those strangers. Likewise, I don’t think you realize 100x more welfare than a chicken can.
I think it’s also helpful to empathise the other way around too when working on these thought-experiments. Species-membership is merely a shortcut for speaking about typical cognitive and hedonic capacities in this report, species itself is irrelevant. You might be thinking that prioritising the torture of 1000 chickens over the life of one human being doesn’t make you feel valued as much as you should be.
But it could be the other way around in the real life as well. An illness could befall on us or our loved ones. It could very much be the case that my sister had cognitive/hedonic capacities comparable to a pig. I wouldn’t feel very much valued if my sister being tortured for a year was considered to be less of a deal compared to averting the 10 minutes long headache of a typical human being in this case.
I will respond with my interpretation of the report, so that the author might correct me to help me understand it better.
If you ask “If we have an option between preventing the birth of Sabs versus preventing the birth of an average chicken, how many chickens is Sabs worth?” then Sabs might be worth −10 chickens since chickens have net negative lives whereas you (hopefully) have a net positive life.
If you ask “Let’s compare a maximally happy Sabs and maximally happy chickens, how many chickens is Sabs worth?”, I don’t think these estimates respond to that either. It might be the case that chickens have a very large welfare range, but this is mostly because they have a potential for feeling excruciating pain even though their best lives are not that good.
I think you need to complement this research with “how much the badness of average experiences of animals compare with each other” to answer your question. This report by Rethink Priorities seems to be based on the range between the worst and the best experiences for each species.
This is exactly right, Emre. We are not commenting on the average amount of value or disvalue that any particular kind of individual adds to the world. Instead, we’re trying to estimate how much value different kinds of individuals could add to the world. You then need to go do the hard work of assessing individuals’ actual welfare levels to make tradeoffs. But that’s as it should be. There’s already been a lot of work on welfare assessment; there’s been much less work on how to interpret the significance of those welfare assessments in cross-species decision-making. We’re trying to advance the latter conversation.
Thank you for the prompt reply Bob. Just to be clear, I am happy about the scope of this project and am impressed by its quality. I do not intend to criticise the report for being mindful about its scope.
Didn’t take it that way at all! I appreciate your taking the time to comment and help clarify what we’ve done.
I love this research! Thank you so much for doing it!
My gut reaction to the results is that it’s odd that humans are so high up in terms of their capacity for welfare. Just as an uninformative prior, I would’ve expected us to be somewhere in the middle. Less confidently, I would’ve expected a similar number of orders of magnitude deviation from the human baseline in either direction, within reason. E.g. +/- ~.5 OOM.
Plus, we are humans, so there’s a risk that we’re biased in our favor. It could be simply a bias from our ability to emphasize with other humans. But it could also be the case that there are countless more markers of sentience that humans don’t have (but many other sentient animals do) that we are prone to overlook.
Have you investigated what the sources of this effect might be? There might be any number of biases at work as I mentioned, but perhaps our lives have become so comfy most of the time that we perceive slight problems very strongly (e.g., a disapproving gaze). If then something really bad happens, it feels enormously bad?
(I’ve in the past explicitly assumed that most beings with a few (million) neurons have a roughly human capacity for welfare – not because I thought that was likely but because I couldn’t tell in which direction it was off. Do you maybe already have a defense of the results for people like me?)
In any case, I’ll probably just adopt your results into my thinking now. I don’t expect them to change my priorities much given all the other factors.
Thank you again! <3
Update: When I mentioned this to a friend on a hike, I came up with two ways in which the criteria might be amended to include nonhuman ones: (1) In may cases, we probably have a theory for why a particular behavior or feature is likely to be indicative of conscious experience. Understanding this mechanism, we can look for other systems that might implement the same mechanism, sort of how the eyes of humans, eagles, and flies are very different but we infer that they are probably all for the purpose of vision. (2) Maybe a number of animals that show certain known criteria for consciousness also share suspiciously consistently some other features. One could then investigate whether these features are also indicative of consciousness and whether there are other animals that have these new features at the expense of the older, known ones. (The analysis could cluster features that usually co-occur to not overweight causally related features in cases where many of them are observable.)
Hi Bob,
Great work!
I think it would be nice to have all the estimates in the table here with 3 significant digits, in order not to propagate errors. I understand more digits may give a sense of false precision, but you provide the 5th and 95th percentiles in the same table, so I suppose the uncertainty is already being conveyed.
Why do you give estimates for the median moral weight, instead of the mean moral weight? Normally, we care about expectations...
Thanks, Vasco!
Short version: I want to discourage people from using these numbers in any context where that level of precision might be relevant. That is, if the sign of someone’s analysis turns on three significant digits, then I doubt that their analysis is action-relevant.
As for medians rather than means, our main concern there was just that means tend to be skewed toward extremes. But we can generate the means if it’s important!
Finally, I should stress that I’m seeing people use these “moral weights” roughly as follows: “100 humans = ~33 chickens (100*.332= ~33).” This is not the way they’re intended to be used. Minimally, they should be adjusted by lifespan and average welfare levels, as they are estimates of welfare ranges rather than all-things-considered estimates of the strength of our moral reasons to benefit members of one species rather than another.
Hi again,
Sorry, I forgot to touch on this point:
Do you think the extremes of your moral weight distributions are reasonable? If so, even if the mean is skewed towards them, it would become more accurate. Anyways, I would say sharing the mean would be important, such that people could see how much influence extremes have (i.e. how heavy-tailed is the moral weight distribution).
Sorry for the slow reply, Vasco. Here are the means you requested. My vote is that if people are looking for placeholder moral weights, they should use our 50th-pct numbers, but I don’t have very strong feelings on that. And I know you know this, but I do want to stress for any other readers that these numbers are not “moral weights” as that term is often used in EA. Many EAs want one number per species that captures the overall strength of their moral reason to help members of that species relative to all others, accounting for moral uncertainty and a million other things. We aren’t offering that. The right interpretation of these numbers is given in the main post as well as in our Intro to the MWP.
Thanks for clarifying and sharing the means, Bob! There are some significant differences to the medians for some species, so it looks like it would be important to see whether the extremes of the distributions are being well represented.
Thanks for clarifying!
I thought this would be the reason. That being said, I still think it makes sense to present the results with 2 or 3 significantdigits whenever the uncertainty is already being conveyed. For example, if I say the mean moral weight is 1.00, and the 5th and 95th percentiles are 0.00100 and 1.00 k, it should be clear that the result is pretty uncertain, even though all numbers have 3 significant digits.
I agree in general, but wonder whether for some cases it may matter in a non-crucial way. For example, the ratio between 1.50 and 2.49 is 0.602 without rounding, but 1 if we round both numbers to 2. An error of a factor of 0.602 may not be crucial, but it will not necessarily be totally negligible either.
Ahah, I agree! They are supposed to be used as follows: “100 chickens = 100*0.332 humans = 33.2 humans”. One should always be careful not to interpret the moral weight of chickens relative to humans as that of humans relative to chickens, and also present the final result with 3 significant digits instead of 2.
Jokes apart, when I read “[based on RP’s median moral weights] 100 chickens = 33.2 humans”, I assume we are considering the duration and intensity of experience (relative to the moral weight) are the same for both humans and chickens, because that is what the moral weight alone tells us. However, if one says “saving x humans equals saving y chickens”, I agree the moral weights have to be combined with other variables, because now we are describing the consequences of actions instead of just a direct comparison of experiences.
The probability of sentience is multiplied through here, right? Some of these animals are assigned <50% probability of sentience but have nonzero probability of sentience-adjusted welfare ranges at the median. Another way to present this would be to construct the random variable that’s 0 if they’re not sentient, and then equal to the random variable representing their moral weight conditional on sentience. This would be your actual distribution of welfare ranges for the animal, accounting for their probability of sentience. That being said, what you have now might be more useful to represent a range of expected moral weights for (approximately) risk-neutral EV-maximizing utilitarians, to represent deep uncertainty or credal fragility.
The use of expected value doesn’t seem useful here. Your confidence intervals are huge (95% confidence interval for pig suffering capacity relative to humans is between 0.005 to 1.031). Because the implications are so different across that spectrum (varying from basically “make the cages even smaller, who cares” at 0.005 to “I will push my nan down the stairs to save a pig” at 1.031) it really doesn’t feel like I can draw any conclusions from this.
Fair enough, Henry. We have limited faith in the models too. But as we said:
The numbers are placeholders.
Our actual views are summarized in the key takeaways and again toward the end (e.g., within an order of magnitude of humans for vertebrates--0.1 or above—which certainly does make a practical difference).
This work builds on everything else we’ve done and is not, all on its own, the complete case for relatively animal-friendly welfare range estimates.
To follow up on Bob’s point, the ranges presented here are from a mixture model which combines the results from several models individually. You can see the results for each model here: https://docs.google.com/spreadsheets/d/1SpbrcfmBoC50PTxlizF5HzBIq4p-17m3JduYXZCH2Og/edit?usp=sharing
For example, the 0.005 arises because we are including the neuron count model of welfare ranges in our overall estimates. If you don’t include this model (as there are good reasons not to, see https://forum.effectivealtruism.org/posts/Mfq7KxQRvkeLnJvoB/why-neuron-counts-shouldn-t-be-used-as-proxies-for-moral) then the 5th percentile welfare range for pigs of all models combined is 0.20.
The 1.031 comes from a model called the “Undiluted Experiences” model, which suggests that animals with lower cognitive abilities have greater welfare ranges because they are not as able to rationalize their feelings (eg. pets being anxious when you’re packing for a trip). A somewhat different model would be the “Higher-Lower Pleasures” model that is built on the idea that higher cognitive capacities means you can experience more welfare (akin to the JS Mill idea of higher-order pleasures). Under this model, we estimate that the range for pigs is 0.23 to 0.49--which is quite significant given how this model could be seen as having a pro-human bias!
In sum, the welfare ranges presented above reflect our high degree of uncertainty surrounding how to think about measuring welfare. As such, we invite you to take a closer look at each model (you’ll find most of them converge on the overall conclusion that vertebrates are within an order of magnitude of humans in terms of their welfare ranges).
I’m curious whether you’ve indicated parental care is “present” or “absent” in bees, however, I have briefly checked the documents linked and couldn’t find where that lives but maybe I missed it. Can anyone link to that documentation?
(Bees provide care to young, but it’s primarily done by siblings, not parents, so it’s considered alloparental care, not parental care. I should think that probably counts, but wasn’t sure.)
Sorry about the confusion, mvolz. The table with the models is tricky to navigate. Here’s the one we shared originally, which is clearer. Short answer: yes, we said it was present.
This project seems interesting, but I think you’re importantly wrong when you say that we shouldn’t dismiss your results based on our intuitions.
Our intuitions are our ONLY guide to the possible internal mental states of other animals. You are right to try to systematize—to find scientific, measurable criteria that match our intuitions on easier cases and help shape them on harder ones. However, whatever criteria you find NECESSARILY relies on intuitions, for example in your decision of what to include or not include as a proxy. Also, the only way we could judge your theory even in principle is by our intuitions.
If someone tells you “after reviewing all the behavioral and cognitive abilities of chickens (and after deeply understanding the proxies you’ve listed), I reject your estimates of their welfare range,” there’s nothing you can say to that person. Therefore, saying things like
Is just exaggerating the objectivity of your evidence. It’s not objective, it’s more like, objectively subjective. You remind me of the Integrated Information Theory people, who, when confronted with “your theory implies a CD drive is more conscious than a human”, answer back with “yes, CD drives are more conscious, don’t rely on your intuitions to dismiss empirical measurements of consciousness”. You’re making a fundamental error! Intuitions are SUPERIOR to fancy theories that say CD drives are more conscious than humans, no matter how fancy those theories are.
Now, happily, your theory does not say CD drives are more conscious than humans. But it does say that one single beehive is worth more than 10 humans, even after adjusting for lifespan. Literally you are claiming that improving the lives of the bees in a suffering beehive (only until they die naturally a month later!) is more important than permanently healing a paraplegic—by an order of magnitude. You are claiming that, though the suffering a human can experience in an hour may be vast, it is not quite as vast as the suffering-per-hour which can be experienced by (checks notes) 15 bees.
I appreciate the work you put into this, and I think there’s much to learn from it (I’ve already learned from your reports, despite not reading everything). But you are wrong about your placeholder range estimates—importantly, obviously wrong, orders-of-magnitude wrong, these-numbers-should-not-be-used-for-anything wrong.
I don’t want to over-criticize because I think being wrong is much better than doing nothing! But I’m afraid of people citing these estimates without realizing how off they are from what normal people value (or what normal people would value even if perfectly informed).
Thanks for reading, LGS. As I’ve argued elsewhere, utilitarianism probably leads us to say equally uncomfortable things with more modest welfare range estimates. I’m assuming you wouldn’t be much happier if we’d argued that 10 beehives are worth more than a single human. At some point, though, you have to accept a tradeoff like that if you’re committed to impartial welfare aggregation.
For what it’s worth, and assuming that you do give animals some weight in your deliberations, my guess is that we might often agree about what to do, though disagree about why we ought to do it. I’m not hostile to giving intuitions a fair amount of weight in moral reasoning. I just don’t think that our intuitions tell us anything important about how much other animals can suffer or the heights of their pleasures. If I save humans over beehives, it isn’t because I think bees don’t feel anything—or barely feel anything compared to humans. Instead, it’s because I don’t think small harms always aggregate to outweigh large ones, or because I give some weight to partiality, or because I think death is much worse for humans than for bees, or whatever. There are just so many other places to push back.
But what you think does tell us about how much other animals can suffer or the height of their pleasures is… listing out ~100 traits like “bees play” and “bees don’t express love”, plugging them into a complex model, and getting “14 bees is around 1 human” as the answer.
That’s not any better! You’re merely hiding your intuitions behind a complex model. But the inputs to your model are no better than the inputs to mine! Your inputs are things like “bees play”, which I’m already inputting into my intuition (along with many other facts your model cannot take into account). You’re weighing all proxies equally; my intuition uses a very skewed weighted average of the proxies. But the uniform distribution is not special! It’s just as arbitrary!
Would it help if I coded up a simple model which took in “bees play but don’t express love” and the rest of the list, and outputted 0.00000001? We surely both agree that I can do it. What makes you confident your model is more justified, if not your own intuitions?
This sounds perilously close to “our numbers shouldn’t be used for anything”. Is there any decision you’d be comfortable making based on your numbers (the 14 bees is 1 human thing)?
Hi LGS. A few quick points:
You don’t know what my intuitions about bees were before we began, nor what they are now. FWIW, I came into this project basically inclined to think of insects as little robots. Reading about them changed what I think I should say. However, my intuitions probably haven’t shifted that much. But as we’ve seen, I place less weight on my intuitions here than you do.
You’re ignoring what we say in the post: our actual views, which are informed by the models but not directly determined by them, are that the verts are within one OOM of humans and inverts are within 2 OOMs of the verts. The specific values are, as we indicate, just placeholders.
We tried to develop a methodology that makes our estimates depend on the state of empirical knowledge. I’ll be the first to point out its limitations. If we’re listing criticisms, I’m worried about things like generalizing within taxonomic categories, the difficulty of scoring individual proxies, and the problem of handling missing data—not “hiding our intuitions behind a complex model.”
I want to do better going forward. This is the first step in an iterative process. If you have concrete suggestions about how to improve the methodology, please let me know.
(I work at RP and reviewed parts of this work, but am not a co-author for this report and am not speaking for RP or the authors.)
I think this is partly why they considered so many different models, with different sets of proxies, as well as the grouped proxies model. This effectively represents a range of different possible weights, although it might not cover your views.
I may be misunderstanding, but because it seems you’ve already decided on the answer (0.00000001), I’d worry about motivated reasoning. If you were going to make a model, I’d recommend a first pass without looking at how the criteria (or model outputs) differ between species while trying to give plausible accounts for why particular criteria matter and why they matter as much as you think they should. Of course, the fact that bees can or can’t do something could also be evidence about the value of the criteria. For example, maybe some criterion you thought was strong evidence for a particular cognitive mechanism is met by bees, but you have substantial reason to believe bees lack this cognitive mechanism, so you could be justified in reducing the weight to that proxy after learning bees meet the criterion (or just using the cognitive mechanism directly as a criterion). Ideally, you should be able to justify such changes to your weights with a better story for why they were wrong before than just “bees do it”, which I think would be motivated reasoning.
I would definitely be interested in hearing which proxies you’d include, how you’d weigh them and why. A model might be useful, although I think the reasoning would be the most useful part.
Personally, when I think of the intensity of suffering and pleasure, the kinds of (largely vague) accounts I have in mind depend on attention and prioritization or can use those as proxies scaling roughly monotonically with intensity (similar to Welfare Footprint Project’s definitions for levels of pain, annoying, hurtful, disabling and excruciating), and humans don’t seem particularly special here for excruciating pain and extreme fear (e.g. torture), i.e. I expect mammals and birds to respond with attention and prioritization similar to humans.
I’m not sure either way if bees would attend to and prioritize anything to the same degree we do our most intense states while in them, so they might have narrower hedonic ranges if they don’t, but I don’t think we can rule it out. The mechanisms may be different, even very different, but those differences might not matter, and they won’t necessarily favour humans over bees rather than bees over humans. So, it just doesn’t seem extremely unlikely to me that bees have hedonic ranges similar to ours. Also, the extremes for humans might not even be very relevant, given that little (neartermist) EA funding seems to be targeted at them.
Two other ways bees might have substantially narrower (expected) hedonic ranges than humans are based on conscious subsystems (I’m a co-author on that post, and it can be roughly captured by the neuron count model here), the just noticeable differences model (assuming they do have fewer JNDs) or possibly some combination thereof, and possibly under different scaling laws than here.
It’s hard for me to imagine why other things would matter much for hedonic ranges. At least, I haven’t come across any other plausible arguments that favoured humans by orders of magnitude.
Well, first of all, we should be very uncertain that species sufficiently far from us are even capable of suffering at all. I take it as self-evident that suffering is in the brain, a property of neurons; however, I also think that small machine learning models clearly don’t experience suffering, so “having something that’s kinda maybe like neurons” is not sufficient.
Bees have fewer than a million neurons and likely fewer “parameters” than even small LLMs like BERT-base. Moreover, a lot of those neurons are likely used for controlling the wings and for implementing hardcoded algorithms like “build a hex-tiled beehive”.
I don’t believe a single neuron, by itself, can experience pain. It’s an emergent phenomenon. But the fact that it’s emergent suggests it might even scale super-linearly with the number of neurons (e.g. perhaps quadratically, for the number of possible interactions between neurons, though that’s far from certain). I find the assumption of sub-linear scaling (or no scaling at all) to be particularly weird.
Apart from the neuron count, there’s the issue that bees are so evolutionarily far away that they basically developed their intelligence independently of us. Our common ancestor with them was an early bilateral -- a primitive worm with few organs. The split likely happened soon after the development of memory, which itself was soon-ish after the evolution of the brain. The worms in question were so primitive that one line started crawling backwards, eating out of its anus and pooping out of its mouth—this flip actually defines the duterostomes vs protostomes (we’re deuterostomes, bees are protostomes). It’s not clear that “has subjective feelings” dates back to such early brain designs. If it arose later, then bees are either not conscious or have independently-evolved consciousness, which would be nearly impossible to reason about.
I don’t see why “juvenile bees sometimes roll a ball” (which gets translated into “bees play”) should weigh so significantly into our considerations, here. It’s kind of ridiculous. Either you think bees should be weighted highly, or you think they should be given little weight, but why is “juveniles roll a ball!” such strong evidence of the amount of pain or pleasure they feel?
Yet “roll a ball” (and a few similar proxies) is the only thing the OP is going by. There’s nothing else in the model!
Ah, I’d also recommend Bob’s Don’t Balk at Animal-friendly Results in this series.
I agree with all of this, although I think this bears primarily on whether they’re sentient at all, not really on their hedonic welfare range conditional on sentience. I don’t think bees are extremely unlikely to be sentient based on the evidence I have seen and my intuitions about consciousness.
I used to believe that hedonic welfare ranges should very probably scale (sublinearly) with neuron counts, but I’ve become pretty skeptical that they should scale with neuron counts at all based on RP’s work:
Adam Shriver’s post on (mostly against) neuron counts.
Our post on (mostly against) conscious subsystems.
I’d recommend those posts. I also have some more thoughts against superlinear scaling in particular relative to sublinear scaling not covered directly in those two posts, but I’ll put them in a reply to this comment, which is already very long.
I would guess that it did arise independently after our last common ancestor if bees are conscious (similarly for cephalopods). I agree that it makes it much harder to reason about, but I don’t think this gives us more reason to believe that bees have (much) narrower ranges than that they have (much) larger ranges, conditional on their capacity to suffer or experience pleasure at all. Incomparability is another possibility.
Also, you can be guided by more general accounts of or intuitions about consciousness and suffering, or even rough candidate functionalist definitions of hedonic intensity, e.g. trying to generalize Welfare Footprint Project’s.
I’m pretty sympathetic to this, and I’m not very sympathetic to most of the models, other than the neuron count model, the JND model and the equality model. It’s hard for me to see why most of the proxies used would matter, conditional on sentience.
I could imagine “play behavior” mattering if we could code its intensity, either absolutely or relatively, similar to Welfare Footprint Project definitions for pain levels, but this isn’t really what the models here do, and I’d imagine displaying play behavior not really telling us much about intensity anyway. I think panic-like behavior could be a decent indicator for pretty intense suffering (disabling and maybe even excruciating according to WFP) conditional on sentience, so I’d probably give animals without it much narrower welfare ranges.
PTSD-like behavior could be another, but I’d give it less weight, since it seems more likely to be biased either way.
Thanks for your reply. I agree with much of what you write. Below are some disagreements.
This seems to be framing consciousness as a binary, a yes/no. That sounds wrong; many people view it as a sliding scale, and some of your links talk about “more valenced consciousness” etc.
In any event, I understood the 14 bees = 1 human to be after accounting for the low chance of bee sentience. Did I misunderstand? The summary figure lists various disclaimers, but it notably does NOT say “conditioned on sentience”.
One could write equally convincing arguments against “do juveniles roll a ball” as a proxy. Neuron counts are bad and have many flaws; “do they roll a ball” is WORSE and has MORE flaws. That remains the case if you add 100 other subjective proxies, all wishy-washy things, all published by bee researchers eager to tell you bees are amazing.
It remains the case that neuron counts are more objective than any of the other proxies. It also is the case that NOT using neuron counts is a guarantee of not getting tiny estimates: there’s no clear way to combine 100 yes/no proxies and get an answer like “one-millionth of a human”. Only with neuron counts can you get this. I could tell you that even before looking at your results: your methodology eliminates a whole range of answers a priori.
(Also, the argument that bees are amazing is used in Shriver’s post, which makes this discussion circular: he doesn’t want to use neuron counts because it underestimates bees (according to intuition, I guess).)
Compare: if each server on the internet is connected to only 10 other servers on average, does it hold that each user of the internet can only reach a constant number of websites?
No: the graph is an expander, and if there are n nodes, the distance between any two nodes may be as little as O(log n), even if the degree of each node is constant. Hence, via a few hops on the graph, a node may talk to many other nodes (potentially even all of them).
Neural networks (whether artificial or natural) can certainly cause interaction between far away neurons, and the O(log n) distance does not necessarily mean the signal dies. This is similar to how my words reach you despite the packets passing through many servers along the way.
I don’t know for sure that this is how the brain works. However, I find it plausible. I also note that the human brain achieves an incredible amount of intelligence, so certain impressive interactions between neurons are definitely taking place.
I definitely do think welfare ranges can vary across beings, so I’m not thinking in binary terms.
~14 bees to 1 human is indeed after adjusting for the probability of sentience.
Neuron counts are plausibly worse than all of the other proxies precisely because of how large the gaps in welfare range they imply are. The justifications could be bad for most or all proxies, and maybe even worse for others than neuron counts (although I do think some of the proxies are far more justified), but neuron counts could introduce the most bias the way they’re likely to be used. Whether or not they have a heart or literally no proxies at all would give more plausible ranges than neuron counts, conditional on sentience (having a nonzero welfare range at all).
The kinds of proxies for the functions of valence and how they vary with hedonic intensity I’d use would probably give results more similar to any of the non-neuron count models than to the neuron count model (or models with larger gaps). A decent approximation (a lower bound) of the expected welfare range ratio over humans would be the probability that the animal has states of similar hedonic intensity to the most intense in humans, based on behavioural markers of intensity and whether they have the right kinds of cognitive mechanisms. And I can’t imagine assigning tiny probabilities to that, conditional on sentience, based on current evidence (which is mostly missing either way). For bees, they had an estimated 42.5% probability of sentience in this report, so a 16.7% chance of having similarly intense hedonic states conditional on sentience would give you 14 bees per human. I wouldn’t go lower than 1% or higher than 80% based on current evidence, so 16.7% wouldn’t be that badly off. (This is all assuming the expected number of conscious/valenced systems in any brain is close to 1 or lower, or their correlation is very low or we can ignore that possibility for other reasons.)
Wrt packets sent along servers, servers are designed to be very reliable, have buffers in case of multiple or large packets received within a short period, and so on. I’d guess neural signals would compete much more with each other, and at each neuron they reach have a non-tiny chance of not being passed along, so you get decaying signal strength. Many things don’t make it to your conscious awareness. On the other side, there may be multiple similar signals through multiple paths in a brain, but that means more competition between distinct signals, too. Similar signals being sent across multiple paths may also be in part because of more neurons directly connected to periphery firing, not just few neurons influencing a superlinear number of neurons each on average.
If I’m reading this right, you are dismissing neuron counts because of your intuition. You correctly realize that intuitions trump all other considerations, and the game is just to pick proxies that agree with your intuitions and allow you to make them more systematic.
I agree with this approach but strongly disagree with your intuitions. “14 bees = 1 human, after adjusting for probability of sentience” is so CLEARLY wrong that it is almost insulting. That’s my intuition speaking. I’m doing the same thing you’re doing when you dismiss neuron counts, just with a different starting intuition than you.
I think it would be better if the OP was more upfront about this bias.
That’s not what I meant.
I’m not dismissing neuron counts because of my direct intuitions about welfare ranges across species. That would be circular, motivated reasoning and an exercise in curve fitting. I’m dismissing them because the basis for their use seems weak, for reasons explained in posts RP has written and my own (vague) understanding of what plausibly determines welfare ranges in functionalist terms. When RP started this project and for most of the time I spent working on the conscious subsystems report, I actually thought we should use neuron counts by default. I didn’t change my mind about neuron counts because my direct intuitions about relative welfare ranges between specific species changed; I changed my mind because of the arguments against neuron counts.
What I meant in what you quoted is that neuron counts seem especially biased, where the biases are measured relative to the results of quantitative models roughly capturing my current understanding of how consciousness and welfare ranges actually work, like the one I described in the comment you quoted from. Narrow range proxies give less biased results (relative to my models’ results) than neuron counts, including such proxies with little or no plausible connection to welfare ranges. But I’d just try to build my actual model directly.
How exactly are you thinking neuron counts contribute to hedonic welfare ranges, and how does this relate to your views on consciousness? What theories of consciousness seem closest to your views?
Why do you think 14 bees per human is so implausible?
(All this being said, the conscious subsystems hypothesis might still support the use of neuron counts as a proxy for expected welfare ranges, even if the hypothesis seems very unlikely to me. I’m not sure how unlikely; I have deep uncertainty.)
Some thoughts against superlinear scaling in particular relative to sublinear scaling not covered directly in those two posts:
If we count multiple conscious subsystems in a brain even allowing substantial overlap between multiple of them to get to superlinear scaling (that’s substantially faster than linear scaling), that seems likely to imply “double counting” valenced experiences, and my guess is that this would get badly out of hand, e.g. in exponential territory, which would also have counterintuitive implications. I discuss this here.
Humans don’t seem to have many times more synapses per neuron than bees (1,000 to 7,000 in human brains vs ~1,000 in honeybee brains, based on data in [1] and [2]), so the number of direct connections between neurons is close-ish to proportional with neuron counts between humans and bees. We could have many times more indirect connections per neuron through paths of connections, but the influence from one neuron on another it’s only indirectly connected to should decrease with the lengths of paths from the first to the second, because the signal has to make it farther and compete with more signals. This doesn’t rule out superlinear scaling, but can limit it.
A brain duplication thought experiment here.
Multiple other arguments here.
This is extremely interesting and thought-provoking, but bees beating salmon really does undermine any attempt I can make to give this a lot of credence.
Moreso, though, I object to saying we can trade one week of human life for six days of chicken torture (in the comments). But this is more my critique of utilitarianism, as I lay out in “Biting the Philosophical Bullet” here.
Thanks, Matt. As we say, though, we don’t actually think that bees beat salmon. We think that the vertebrates are 0.1 or better of humans, that the vertebrates themselves are within 2x of one another, and that the invertebrates are within 2 OOMs of the vertebrates. We fully recognize that the models are limited by the available data about specific taxa. We aren’t going to fudge the numbers to get more intuitive results, but we definitely don’t recommend using them uncritically.
I hear—and sometimes share—your skepticism about such human/animal tradeoffs. As we argue in a previous post, utilitarianism is indeed to blame for many of these strange results. Still, it could be the best theory around! I’m genuinely unsure what to think here.